Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins

Wei       Chen; Pengmian       Feng; Tao       Liu; Dianchuan       Jin

Abstract

Background: As molecular chaperones, Heat Shock Proteins (HSPs) not only play key roles in protein folding and maintaining protein stabilities, but are also linked with multiple kinds of diseases. Therefore, HSPs have been regarded as the focus of drug design. Since HSPs from different families play distinct functions, accurately classifying the families of HSPs is the key step to clearly understand their biological functions. In contrast to laborintensive and cost-ineffective experimental methods, computational classification of HSP families has emerged to be an alternative approach.

Methods: We reviewed the paper that described the existing datasets of HSPs and the representative computational approaches developed for the identification and classification of HSPs.

Results: The two benchmark datasets of HSPs, namely HSPIR and sHSPdb were introduced, which provided invaluable resources for computationally identifying HSPs. The gold standard dataset and sequence encoding schemes for building computational methods of classifying HSPs were also introduced. The three representative web-servers for identifying HSPs and their families were described.

Conclusion: The existing machine learning methods for identifying the different families of HSPs indeed yielded quite encouraging results and did play a role in promoting the research on HSPs. However, the number of HSPs with known structures is very limited. Therefore, determining the structure of the HSPs is also urgent, which will be helpful in revealing their functions.

Keywords: Heat shock protein, n-peptide composition, reduced amino acid composition, machine learning, drug target, web server.

« Previous Next »

Graphical Abstract

[1] 
Seigneuric, R.; Mjahed, H.; Gobbo, J.; Joly, A.L.; Berthenet, K.; Shirley, S.; Garrido, C. Heat shock proteins as danger signals for cancer detection. Front. Oncol.,  2011, 1, 37.
[2] 
Hendrick, J.P.; Hartl, F.U. Molecular chaperone functions of heat-shock proteins. Annu. Rev. Biochem.,  1993, 62, 349-384.
[3] 
Saibil, H. Chaperone machines for protein folding, unfolding and disaggregation. Nat. Rev. Mol. Cell Biol.,  2013, 14, 630-642.
[4] 
Banerji, U. Heat shock protein 90 as a drug target: Some like it hot. Clin. Cancer Res.,  2009, 15, 9-14.
[5] 
RR. K.; NS, N.; SP, A.; Sinha, D.; Veedin Rajan, V. B.; Esthaki, V.K.; D’Silva, P. HSPIR: A manually annotated heat shock protein information resource. Bioinformatics,  2012, 28, 2853-2855.
[6] 
Dong, C.W.; Zhang, Y.B.; Zhang, Q.Y.; Gui, J.F. Differential expression of three Paralichthys olivaceus Hsp40 genes in responses to virus infection and heat shock. Fish Shellfish Immunol.,  2006, 21, 146-158.
[7] 
Wang, Q.; Bag, J. Induction of expression and co-localization of heat shock polypeptides with the polyalanine expansion mutant of poly(A)-binding protein N1 after chemical stress. Biochem. Biophys. Res. Commun.,  2008, 370, 11-15.
[8] 
Pockley, A.G. Heat shock proteins, inflammation, and cardiovascular disease. Circulation,  2002, 105, 1012-1017.
[9] 
Wu, Y.R.; Wang, C.K.; Chen, C.M.; Hsu, Y.; Lin, S.J.; Lin, Y.Y.; Fung, H.C.; Chang, K.H.; Lee-Chen, G.J. Analysis of heat-shock protein 70 gene polymorphisms and the risk of Parkinson’s disease. Hum. Genet.,  2004, 114, 236-241.
[10] 
Van Noort, J.M.; Bugiani, M.; Amor, S. Heat shock proteins: Old and novel roles in neurodegenerative diseases in the central nervous system. CNS Neurol. Disord. Drug Targets,  2017, 16, 244-256.
[11] 
Dattilo, S.; Mancuso, C.; Koverech, G.; Di Mauro, P.; Ontario, M.L.; Petralia, C.C.; Petralia, A.; Maiolino, L.; Serra, A.; Calabrese, E.J.; Calabrese, V. Heat shock proteins and hormesis in the diagnosis and treatment of neurodegenerative diseases. Immun. Ageing,  2015, 12, 20.
[12] 
Urbanics, R. Heat shock proteins in stroke and neurodegenerative diseases. Curr. Opin. Investig. Drugs,  2002, 3, 1718-1719.
[13] 
Ciocca, D.R.; Calderwood, S.K. Heat shock proteins in cancer: Diagnostic, prognostic, predictive, and treatment implications. Cell Stress Chaperones,  2005, 10, 86-103.
[14] 
Chatterjee, S.; Burns, T.F. Targeting heat shock proteins in cancer: A promising therapeutic approach. Int. J. Mol. Sci.,  2017, 18, pii E1978.
[15] 
Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.C. iRNA-PseU: Identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids,  2016, 5, e332.
[16] 
Chen, W.; Tran, H.; Liang, Z.; Lin, H.; Zhang, L.Q. Identification and analysis of the N6-methyladenosine in the Saccharomyces cerevisiae transcriptome. Sci. Rep.,  2015, 5, 13859.
[17] 
Chen, W.; Xing, P.; Zou, Q. Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci. Rep.,  2017, 7, 40242.
[18] 
Lin, H.; Chen, W.; Ding, H. AcalPred: A sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS One,  2013, 8, e75726.
[19] 
Lin, H.; Ding, C.; Song, Q.; Yang, P.; Ding, H.; Deng, K.J.; Chen, W. The prediction of protein structural class using averaged chemical shifts. J. Biomol. Struct. Dyn.,  2012, 29, 1147-1153.
[20] 
Lin, H.; Liu, W.X.; He, J.; Liu, X.H.; Ding, H.; Chen, W. Predicting cancerlectins by the optimal g-gap dipeptides. Sci. Rep.,  2015, 5, 16964.
[21] 
Wang, X.F.; Zhang, Y.; Wang, J.M. Prediction of protein structural class based on reliefF-SVM. Lett. Org. Chem.,  2017, 14, 696-702.
[22] 
UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Res.,  2017, 45, D158-D169.
[23] 
Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A.; Salazar, G.A.; Tate, J.; Bateman, A. The Pfam protein families database: Towards a more sustainable future. Nucleic Acids Res.,  2016, 44, D279-D285.
[24] 
Marchler-Bauer, A.; Bo, Y.; Han, L.; He, J.; Lanczycki, C.J.; Lu, S.; Chitsaz, F.; Derbyshire, M.K.; Geer, R.C.; Gonzales, N.R.; Gwadz, M.; Hurwitz, D.I.; Lu, F.; Marchler, G.H.; Song, J.S.; Thanki, N.; Wang, Z.; Yamashita, R.A.; Zhang, D.; Zheng, C.; Geer, L.Y.; Bryant, S.H. CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Res.,  2017, 45, D200-D203.
[25] 
Finn, R.D.; Attwood, T.K.; Babbitt, P.C.; Bateman, A.; Bork, P.; Bridge, A.J.; Chang, H.Y.; Dosztanyi, Z.; El-Gebali, S.; Fraser, M.; Gough, J.; Haft, D.; Holliday, G.L.; Huang, H.; Huang, X.; Letunic, I.; Lopez, R.; Lu, S.; Marchler-Bauer, A.; Mi, H.; Mistry, J.; Natale, D.A.; Necci, M.; Nuka, G.; Orengo, C.A.; Park, Y.; Pesseat, S.; Piovesan, D.; Potter, S.C.; Rawlings, N.D.; Redaschi, N.; Richardson, L.; Rivoire, C.; Sangrador-Vegas, A.; Sigrist, C.; Sillitoe, I.; Smithers, B.; Squizzato, S.; Sutton, G.; Thanki, N.; Thomas, P.D.; Tosatto, S.C.; Wu, C.H.; Xenarios, I.; Yeh, L.S.; Young, S.Y. Mitchel,l A.L. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res.,  2017, 45, D190-D199.
[26] 
Jaspard, E.; Hunault, G. sHSPdb: A database for the analysis of small Heat Shock Proteins. BMC Plant Biol.,  2016, 16, 135.
[27] 
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol.,  2011, 273, 236-247.
[28] 
Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics,  2012, 28, 3150-3152.
[29] 
Feng, P.M.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.C. iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol. Ther. Nucleic Acids,  2017, 7, 155-163.
[30] 
Chen, W.; Yang, H.; Feng, P.M.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics,  2017, 33(22), 3518-3523.
[31] 
Chen, W.; Ding, H.; Feng, P.M.; Lin, H.; Chou, K.C. iACP: A sequence-based tool for identifying anticancer peptides. Oncotarget,  2016, 7, 16895.
[32] 
Chen, W.; Lin, H. Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. Comput. Biol. Med.,  2012, 42, 504-507.
[33] 
Feng, P.M.; Chen, W.; Lin, H.; Chou, K.C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem.,  2013, 442, 118-125.
[34] 
Ru, B.; Hoen, P.A.; Nie, F.; Lin, H.; Guo, F.B.; Huang, J. PhD7Faster: Predicting clones propagating faster from the Ph.D.-7 phage display peptide library. J. Bioinform. Comput. Biol.,  2014, 12, 1450005.
[35] 
He, B.; Kang, J.; Ru, B.; Ding, H.; Zhou, P.; Huang, J. SABinder: A web service for predicting streptavidin-binding peptides. BioMed Res. Int.,  2016, 2016, 9175143.
[36] 
Li, N.; Kang, J.; Jiang, L.; He, B.; Lin, H.; Huang, J. PSBinder: A web service for predicting polystyrene surface-binding peptides. BioMed Res. Int.,  2017, 2017, 5761517.
[37] 
Lin, H.; Chen, W. Prediction of thermophilic proteins using feature selection technique. J. Microbiol. Methods,  2011, 84, 67-70.
[38] 
Chen, W.; Lin, H. Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. Comput. Biol. Med.,  2012, 42, 504-507.
[39] 
Ding, H.; Deng, E.Z.; Yuan, L.F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.C. iCTX-Type: A sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res. Int.,  2014, 2014, 286419.
[40] 
Ding, H.; Liang, Z.Y.; Guo, F.B.; Huang, J.; Chen, W.; Lin, H. Predicting bacteriophage proteins located in host cell with feature selection technique. BioMed Res. Int.,  2016, 71, 156-161.
[41] 
Tang, H.; Zhang, C.M.; Chen, R.; Huang, P.; Duan, C.G.; Zou, P. Identification of secretory proteins of malaria parasite by feature selection technique. Lett. Org. Chem.,  2017, 14, 621-624.
[42] 
Feng, Y.E.; Zhao, W. Identify protein 8-class secondary structure with quadratic discriminant algorithm based on the feature combination. Lett. Org. Chem.,  2017, 14, 625-631.
[43] 
Feng, P.M.; Chen, W.; Lin, H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip. Sci.,  2016, 8, 186-191.
[44] 
Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med.,  2013, 2013, 530696.
[45] 
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naive Bayes. Comput. Biol. Med.,  2013, 2013, 567529.
[46] 
Mirny, L.A.; Shakhnovich, E.I. Universally conserved positions in protein folds: Reading evolutionary signals about stability, folding kinetics and function. J. Mol. Biol.,  1999, 291, 177-196.
[47] 
Zuo, Y.; Li, Y.; Chen, Y.; Li, G.; Yan, Z.; Yang, L. PseKRAAC: A flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics,  2017, 33, 122-124.
[48] 
Zuo, Y.; Lv, Y.; Wei, Z.; Yang, L.; Li, G.; Fan, G. iDPF-PseRAAAC: A Web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One,  2015, 10, e0145541.
[49] 
Zuo, Y.C.; Li, Q.Z. Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides,  2009, 30, 1788-1793.
[50] 
De Brevern, A.G. New assessment of a structural alphabet. In Silico Biol.,  2005, 5, 283-289.
[51] 
Etchebest, C.; Benros, C.; Bornot, A.; Camproux, A.C.; De Brevern, A.G. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur. Biophys. J.,  2007, 36, 1059-1069.
[52] 
de Brevern, A.G.; Etchebest, C.; Hazout, S. Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Proteins,  2000, 41, 271-287.
[53] 
Feng, P.M.; Lin, H.; Chen, W.; Zuo, Y. Predicting the types of J-proteins using clustered amino acids. BioMed Res. Int.,  2014, 2014, 935719.
[54] 
Feng, P.M.; Chen, W.; Lin, H.; Chou, K.C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem.,  2013, 442, 118-125.
[55] 
Kumar, R.; Kumari, B.; Kumar, M. PredHSP: Sequence based proteome-wide heat shock protein prediction and classification tool to unlock the stress biology. PLoS One,  2016, 11, e0155872.
[56] 
Mitra, A.; Shevde, L.A.; Samant, R.S. Multi-faceted role of HSP40 in cancer. Clin. Exp. Metastasis,  2009, 26, 559-567.
[57] 
Sterrenberg, J.N.; Blatch, G.L.; Edkins, A.L. Human DNAJ in cancer and stem cells. Cancer Lett.,  2011, 312, 129-142.
[58] 
Feng, P.M.; Lin, H.; Chen, W.; Zuo, Y. Predicting the types of J-proteins using clustered amino acids. BioMed Res. Int.,  2014, 2014, 935719.

Rights & Permissions Print Cite

Article Metrics

44

7

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1389200219666181031105916	Print ISSN 1389-2002
Publisher Name Bentham Science Publisher	Online ISSN 1875-5453

Current Drug Metabolism

Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins

Abstract

Graphical Abstract

Interaction between drugs and endocrine diseases

Tissue Distribution and Metabolism of Micro- and Nanoparticles and Medical Implants

Current Drug Metabolism

Recent Advances in Machine Learning Methods for Predicting Heat Shock Proteins

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Interaction between drugs and endocrine diseases

Tissue Distribution and Metabolism of Micro- and Nanoparticles and Medical Implants

Related Journals

Related Books