Background: The knowledge of protein structural class plays an important role in understanding
its tertiary structure. The globular protein domains, whose fold types are surprisingly similar,
in spite of complex and irregular in natural condition, can be mainly divided into the following four
classes： all – α, all – β, α /β, and alpha; +β according to secondary structural content. Various significant
efforts have been made to predict protein structural classes. However, the information of protein sequence
representation may exist redundancy in these approaches.
Method: The Relief F-SVM classification model was proposed to predict protein structural class. First,
pseudo amino acid compositions (PseAA) features were extracted from each protein in the dataset,
where features redundancy exists. Then, we used Relief F feature extraction method to reduce redundancy.
Next, the optimized samples were given as input into the SVM. As the parameters were difficult
to assure, the Simulated Annealing Particle Swarm Optimization (SAPSO) algorithm was embedded
into the SVM.
Results: After the features are selected by the ReliefF algorithm, the dimension of the features was reduced
from 420 to 292. The time of experiment reduced from 372.32s to195.58s, time-consuming reduced
by nearly half. We compared it with the other existing methods to evaluate our method objectively.
For the C204 dataset, the overall classification accuracy was 95.4% obtained using our method,
which was 14.5% higher than the covariant matrix algorithm. Compared with the previous SVM, our
method has improved by 10.1%. Under the circumstances of consistent feature data, the proposed
method had 4.6% improvement over IDQD. As shown, the overall accuracy of the proposed method
for the Z277 dataset achieved 96.5%, being higher than those of other methods.
Conclusion: The results found in this study further support the results of the description of protein sequence
reported by Lin, and our method reduces the time-consumption by 47%. The accuracy of the
prediction classification is also greatly improved, which proves the effectiveness of our method.