Protein structural class prediction is beneficial to protein structure and function analysis.
Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated
the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity
protein sequences. However, the prediction accuracies still remain limited. To explore the
potential of secondary structure information, a novel feature extraction method based on a generalized
chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted
into a 20-dimensional distance-related statistical feature vector to characterize the distribution of
secondary structure elements and segments. The feature vectors are then fed into a support vector machine
classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity
benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior
performance to the state-of-the-art methods. It is anticipated that our method could be extended to
other graphical representations of protein sequence and be helpful in future protein research.
Keywords: Protein structural class, sequence similarity, secondary protein structure, chaos game representation, support vector
Rights & PermissionsPrintExport