A Sequence-based Approach for Predicting Protein Disordered Regions

Author(s): Tao Huang, Zhi-Song He, Wei-Ren Cui, Yu-Dong Cai, Xiao-He Shi, Le-Le Hu, Kuo-Chen Chou

Journal Name: Protein & Peptide Letters

Volume 20 , Issue 3 , 2013

Become EABM
Become Reviewer
Call for Editor


Protein disordered regions are associated with some critical cellular functions such as transcriptional regulation, translation and cellular signal transduction, and they are responsible for various diseases. Although experimental methods have been developed to determine these regions, they are time-consuming and expensive. Therefore, it is highly desired to develop computational methods that can provide us with this kind information in a rapid and inexpensive manner. Here we propose a sequence-based computational approach for predicting protein disordered regions by means of the Nearest Neighbor algorithm, in which conservation, amino acid factor and secondary structure status of each amino acid in a fixed-length sliding window are taken as the encoding features. Also, the feature selection based on mRMR (maximum Relevancy Minimum Redundancy) is applied to obtain an optimal 51-feature set that includes 39 conservation features and 12 secondary structure features. With the optimal 51 features, our predictor yielded quite promising MCC (Mathew's correlation coefficients): 0.371 on a rigorous benchmark dataset tested by 5-fold cross-validation and 0.219 on an independent test dataset. Our results suggest that conservation and secondary structure play important roles in intrinsically disordered proteins.

Keywords: Feature space, mRMR, sliding window, intrinsically disordered proteins, nearest neighbor algorithm, amino acid factor, conservation score, signal transduction, translation, regulation

open access plus

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2013
Published on: 17 January, 2013
Page: [243 - 248]
Pages: 6
DOI: 10.2174/0929866511320030002

Article Metrics

PDF: 18