Generic placeholder image

Current Proteomics

Editor-in-Chief

ISSN (Print): 1570-1646
ISSN (Online): 1875-6247

Research Article

Discrimination of Thermophilic and Mesophilic Proteins Using Support Vector Machine and Decision Tree

Author(s): Haixin Ai, Li Zhang, Jikuan Zhang, Tong Cui, Alan K. Chang and Hongsheng Liu*

Volume 15, Issue 5, 2018

Page: [374 - 383] Pages: 10

DOI: 10.2174/1570164615666180718143606

Price: $65

Abstract

Background: The need to enhance the stability of proteins is vital to protein engineering and design. The manipulation of protein stability is also important to understand the principles that govern protein thermostability, both in basic research and industrial application.

Objective: To build models that can discriminate thermophilic and mesophilic proteins and comprehend the factors influencing protein thermostability using machine learning methods.

Method: A total of 613 protein features were calculated and various feature selection algorithms were used to build subset features. Support vector machine and decision tree methods were applied to predict the thermostability of the proteins, and the problems caused by unbalanced data were resolved by using a grid search method to find the best weights of error costs for different classes.

Results: According to the result, the influence of primary structure on the thermostability of a protein was more important than the influence of secondary structure. The best classification model was obtained when the support vector machine was run on the subset of amino acid composition plus amino acid class composition, which yielded a prediction accuracy of 84.07%. At the primary structure level, Gln, Glu, and Ser were the features that contributed most to protein thermostability. At the secondary structure level, Q_coil and Helix_E were the most important features affected protein thermostability.

Conclusion: These results suggested that the thermostability of a protein was mainly associated with the primary structural features of the protein.

Keywords: Protein thermostability, support vector machine, decision tree, unbalanced data, dipeptide, amino acid.

Graphical Abstract

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy