Predicting Protein Structural Class by Incorporating Patterns of Over- Represented k-mers into the General form of Chou’s PseAAC

Yu-Fang      Qin; Chun-Hua      Wang; Xiao-Qing      Yu; Jie      Zhu; Tai-Gang      Liu; Xiao-Qi      Zheng

Abstract

Computational prediction of protein structural class based on sequence data remains a challenging problem in current protein science. In this paper, a new feature extraction approach based on relative polypeptide composition is introduced. This approach could take into account the background distribution of a given k-mer under a Markov model of order k-2, and avoid the curse of dimensionality with the increase of k by using a T-statistic feature selection strategy. The selected features are then fed to a support vector machine to perform the prediction. To verify the performance of our method, jackknife cross-validation tests are performed on four widely used benchmark datasets. Comparison of our results with existing methods shows that our method provides satisfactory performance for structural class prediction.

Keywords: Markov model, protein structural class, relative polypeptide composition, support vector machine, T-statistic

« Previous Next »