Generic placeholder image

Current Biotechnology


ISSN (Print): 2211-5501
ISSN (Online): 2211-551X

Research Article

Thermostability of Proteins Revisited Through Machine Learning Methodologies: From Nucleotide Sequence to Structure

Author(s): Debamitra Chakravorty, Mohd Faheem Khan and Sanjukta Patra

Volume 6, Issue 1, 2017

Page: [39 - 49] Pages: 11

DOI: 10.2174/2211550105666151222183232

Price: $65


Background: Predicting thermostability of a protein, given its sequence or structure is a challenging job. Predicting which mutations can render mesophilic proteins thermostable is further challenging. A guided approach for the same is still elusive. Prediction can be done on the three hierarchies of protein organization: nucleotide sequence of the gene, primary amino acid sequences and the three dimensional structure of proteins. However it is still unclear that which level leads to a better predictor of protein thermostability and what combination of parameters in each level is responsible for thermostability of proteins.

Methods: Present paper addresses this question by testing numerous unsupervised and supervised machine learning methodologies with multitude of prediction parameters to discriminate thermostable and mesostable proteins and to elucidate parameters in each level that lead to protein thermostability. The present method used 33 features of nucleotide sequences, 19 amino acid composition features and 17 structural features. Numerous machine learning approaches: 11 weighting algorithms, 5 unsupervised, 3 supervised learners and 2 lazy modeling algorithms with different parameters were tested and applied on the three datasets.

Results: Results showed that amino acid datasets outperforms the nucleotide and structural datasets in generating the best prediction model for protein thermostability. LibSVM with polynomial kernel when applied on amino acid dataset with 10 fold cross validation results in the best prediction model for classifying thermostable proteins with 90.97% accuracy of prediction. Results also show that CG codons are not deterministic of thermostability. Decrease in Gln content, increase in hydrophobic residues and increase in main chain to main chain hydrogen bonds followed by gamma turns and aromatic aromatic interactions are responsible for protein thermostability.

Conclusion: Present study shows that studying different datasets can bring about the importance of features in each hierarchy of protein organization which is responsible for protein thermostability. The results support the long drawn conclusion that protein amino acid composition is highly correlated to protein thermostability. Thus further insight into all these features can lead to the development of a better predictor and classifier of thermostable proteins. The results support the long drawn conclusion that protein amino acid composition is correlated to protein thermostability. The study also shows that testing of multitude of criteria is important before reaching at a definitive conclusion.

Keywords: Thermostability, codon, amino acid, tertiary structure, predictor, machine learning.

Graphical Abstract

Rights & Permissions Print Export Cite as
© 2023 Bentham Science Publishers | Privacy Policy