Title:Identify Protein 8-Class Secondary Structure with Quadratic Discriminant Algorithm based on the Feature Combination
VOLUME: 14 ISSUE: 9
Author(s):Zhao Wei and Feng Yonge*
Affiliation:College of Science, Inner Mongolia Agriculture University, Hohhot 010018, College of Science, Inner Mongolia Agriculture University, Hohhot 010018
Keywords:Chemical shifts, quadratic discriminant analysis, protein 8-class secondary structure, measure of diversity, hydrophobic,
hydrophilic.
Abstract:Background: The research of protein structure is one of the most important subjects in the
21st century. However, the prediction of protein secondary structure is a key step in the prediction of
protein three-dimensional structure. Protein eight-class secondary structure (SS) prediction has gained
less attention and the implementation of three-class secondary structure (SS) prediction has been done
in the past.
Method: We introduced a model for the prediction of protein eight-class secondary structure using
quadratic discriminant algorithm (QDA) based on the feature combination. We combined chemical
shifts with the measure of diversity as features. The measure of diversity is based on the hydrophilichydrophobic
residues and their dipeptides respectively. Firstly, we extracted the chemical shifts in protein
as features. Then, we implemented the eight-class secondary structures prediction using these
chemical shifts as features. In order to improve the accuracy, we constructed the measure of diversity
based on the hydrophilic-hydrophobic residue. Finally, we combined chemical shifts with the measure
of diversity to predict protein eight-class secondary structures.
Results: We achieved the best accuracy of eight-class secondary structures (Q8) 80.7% in seven-fold
cross-validation combining chemical shifts with the measure of diversity. In the same data set, we performed
the prediction by C8-Scorpion sever, support vector machine (SVM) and random forest (RF)
and the results showed that our prediction model is superior to other algorithms in terms of accuracy.
Conclusion: The finding suggested that our model is an effective model for the prediction of protein
eight-class secondary structures.