Background: Intrinsically disordered proteins lack a well-defined three dimensional
structure under physiological conditions while possessing the essential biological functions. They
take part in various physiological processes such as signal transduction, transcription and posttranslational
modifications and etc. The disordered regions are the main functional sites for
intrinsically disordered proteins. Therefore, the research of the disordered regions has become a hot
Objective: In this paper, our motivation is to analysis of the features of disordered regions with
different molecular functions and predict of different disordered regions using valid features.
Methods: In this article, according to the different molecular function, we firstly divided
intrinsically disordered proteins into six classes in DisProt database. Then, we extracted four
features using bioinformatics methods, namely, amino acid index (AAIndex), codon frequency
(Codon), three kinds of protein secondary structure compositions (3PSS) and Chemical Shifts
(CSs), and used these features to predict the disordered regions of the different functions by
Support Vector Machine (SVM).
Results: The best overall accuracy was 99.29% using the chemical shift (CSs) as feature. In feature
fusion, the overall accuracy can reach 88.70% by using CSs+AAIndex as features. The overall
accuracy was up to 86.09% by using CSs+AAIndex+Codon+3PSS as features.
Conclusion: We predicted and analyzed the disordered regions based on the molecular functions.
The results showed that the prediction performance can be improved by adding chemical shifts and
AAIndex as features, especially chemical shifts. Moreover, the chemical shift was the most
effective feature in the prediction. We hoped that our results will be constructive for the study of
intrinsically disordered proteins.