Discrimination of Thermostable and Thermophilic Lipases Using Support Vector Machines
Wei Zhao, Xunzhang Wang, Riqiang Deng, Jinwen Wang and Hongbo Zhou
Affiliation: Key Laboratory of Biometallurgy of Ministry of Education, School of Minerals Processing and Bioengineering, Central South University, Changsha 410083, People's Republic of China.
Keywords: Amino acid composition, dipeptide composition, tripeptide composition, non-adjacent di-residue coupling patterns, protein stability, support vector machinesAmino acid composition, dipeptide composition, tripeptide composition, non-adjacent di-residue coupling patterns, protein stability, support vector machines
Discriminating thermophilic lipases from their similar thermostable counterparts is a challenging task and it would help to design stable proteins. In this study, the distributions of N (N=2, 3) neighboring amino acids and the nonadjacent di-residue coupling patterns in the sequences of 65 thermostable and 77 thermophilic lipases had been systematically analyzed. It was found that the hydrophobic residues Leu, Pro, Met, Phe, Trp, as well as the polar residue Tyr had higher occurrence in thermophilic lipases than thermostable ones. The occurrence frequencies of KC, EE, KE, RE, VE, YI, EK, VK, EV, YV, EY, KY, VY and YY in thermophilic proteins were significantly higher, while the occurrence frequencies of QC, QH, QN, HQ, MQ, NQ, QQ, TQ, QS and QT were significantly lower. CXP or CPX showed significantly positive to lipase thermostability, while XXQ or QXX showed significantly negative to lipase thermostability. Nonadjacent di-residue coupling patterns of PR14, RY32, YR47, LE53, LE64, PP64, RP70 and PP101 were significantly different in thermophilic lipases and their thermostable counterparts. The composition of dipeptide, tripeptide and nonadjacent di-residue patterns contained more information than amino acid composition. A statistical method based on support vector machines (SVMs) was developed for discriminating thermophilic and thermostable lipases. The accuracy of this method for the training dataset was 97.17%. Furthermore, the highest accuracy of the method for testing datasets was 98.41%. The influence of some specific patterns on lipase thermostability was also discussed.
Rights & PermissionsPrintExport