Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods

Author(s): Shi-Hao Li, Zheng-Xing Guan, Dan Zhang, Zi-Mei Zhang, Jian Huang, Wuritu Yang*, Hao Lin*

Journal Name: Medicinal Chemistry

Volume 16 , Issue 5 , 2020


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Mycobacterium tuberculosis (MTB) can cause the terrible tuberculosis (TB), which is reported as one of the most dreadful epidemics. Although many biochemical molecular drugs have been developed to cope with this disease, the drug resistance—especially the multidrug-resistant (MDR) and extensively drug-resistance (XDR)—poses a huge threat to the treatment. However, traditional biochemical experimental method to tackle TB is time-consuming and costly. Benefited by the appearance of the enormous genomic and proteomic sequence data, TB can be treated via sequence-based biological computational approach-bioinformatics. Studies on predicting subcellular localization of mycobacterial protein (MBP) with high precision and efficiency may help figure out the biological function of these proteins and then provide useful insights for protein function annotation as well as drug design. In this review, we reported the progress that has been made in computational prediction of subcellular localization of MBP including the following aspects: 1) Construction of benchmark datasets. 2) Methods of feature extraction. 3) Techniques of feature selection. 4) Application of several published prediction algorithms. 5) The published results. 6) The further study on prediction of subcellular localization of MBP.

Keywords: Subcellular localization, mycobacterial protein, support vector machine, feature selection, mycobacterium tuberculosis (MTB), terrible tuberculosis (TB).

[1]
Organization, G.W.H. Global tuberculosis report 2018; , 2018, pp. 6-39.
[2]
Chavadi, S.S.; Edupuganti, U.R.; Vergnolle, O.; Fatima, I.; Singh, S.M.; Soll, C.E.; Quadri, L.E. Inactivation of tesA reduces cell wall lipid production and increases drug susceptibility in mycobacteria. J. Biol. Chem., 2011, 286,(28), 24616-24625.
[http://dx.doi.org/10.1074/jbc.M111.247601 ] [PMID: 21592957]
[3]
Rodrigues, L.; Aínsa, J.A.; Amaral, L.; Viveiros, M. Inhibition of drug efflux in mycobacteria with phenothiazines and other putative efflux inhibitors. Recent Pat AntiInfect. Drug. Discov, 2011, 6,, 118-127.
[4]
Adams, K.N.; Takaki, K.; Connolly, L.E.; Wiedenhoft, H.; Winglee, K.; Humbert, O.; Edelstein, P.H.; Cosma, C.L.; Ramakrishnan, L. Drug tolerance in replicating mycobacteria mediated by a macrophage-induced efflux mechanism. Cell, 2011, 145,(1), 39-53.
[http://dx.doi.org/10.1016/j.cell.2011.02.022 ] [PMID: 21376383]
[5]
Rémillard-Labrosse, G.; Mihai, C.; Duron, J.; Guay, G.; Lippé, R. Protein kinase D-dependent trafficking of the large Herpes simplex virus type 1 capsids from the TGN to plasma membrane. Traffic, 2009, 10,(8), 1074-1083.
[http://dx.doi.org/10.1111/j.1600-0854.2009.00939.x ] [PMID: 19548982]
[6]
Wang, Y.Y. Tuberculosis and HIV Coinfection-the Challenge in the Prevention, Detection and Treatment of Tuberculosis. Curr. Bioinform., 2019, 14,(2), 91-99.
[http://dx.doi.org/10.2174/1574893613666180621153734]
[7]
Sankar, M.M.; Gopinath, K.; Singla, R.; Singh, S. In-vitro antimycobacterial drug susceptibility testing of non-tubercular mycobacteria by tetrazolium microplate assay. Ann. Clin. Microbiol. Antimicrob., 2008, 7,, 15.
[http://dx.doi.org/10.1186/1476-0711-7-15]
[8]
Ingham, C.J.; Ayad, A.B.; Nolsen, K.; Mulder, B. Rapid drug susceptibility testing of mycobacteria by culture on a highly porous ceramic support. Int. J. Tuberc. Lung Dis., 2008, 12,(6), 645-650.
[PMID: 18492331]
[9]
Aturaliya, R.N.; Fink, J.L.; Davis, M.J.; Teasdale, M.S.; Hanson, K.A.; Miranda, K.C.; Forrest, A.R.; Grimmond, S.M.; Suzuki, H.; Kanamori, M.; Kai, C.; Kawai, J.; Carninci, P.; Hayashizaki, Y.; Teasdale, R.D. Subcellular localization of mammalian type II membrane proteins. Traffic, 2006, 7,(5), 613-625.
[http://dx.doi.org/10.1111/j.1600-0854.2006.00407.x ] [PMID: 16643283]
[10]
Alahari, A.; Trivelli, X.; Guérardel, Y.; Dover, L.G.; Besra, G.S.; Sacchettini, J.C.; Reynolds, R.C.; Coxon, G.D.; Kremer, L. Thiacetazone, an antitubercular drug that inhibits cyclopropanation of cell wall mycolic acids in mycobacteria. PLoS One, 2007, 2,(12) e1343
[http://dx.doi.org/10.1371/journal.pone.0001343 ] [PMID: 18094751]
[11]
Schramm, B.; de Haan, C.A.; Young, J.; Doglio, L.; Schleich, S.; Reese, C.; Popov, A.V.; Steffen, W.; Schroer, T.; Locker, J.K. Vaccinia-virus-induced cellular contractility facilitates the subcellular localization of the viral replication sites. Traffic, 2006, 7,(10), 1352-1367.
[http://dx.doi.org/10.1111/j.1600-0854.2006.00470.x ] [PMID: 16899087]
[12]
Wei, L. Prediction of human protein subcellular localization using deep learning. J. Parallel Distrib. Comput., 2018, 117,, 212-217.
[http://dx.doi.org/10.1016/j.jpdc.2017.08.009]
[13]
Cheng, L.; Yang, H.; Zhao, H.; Pei, X.; Shi, H.; Sun, J.; Zhang, Y.; Wang, Z.; Zhou, M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief. Bioinform., 2019, 20,(1), 203-209.
[http://dx.doi.org/10.1093/bib/bbx103 ] [PMID: 28968812]
[14]
Cheng, L.; Wang, P.; Tian, R.; Wang, S.; Guo, Q.; Luo, M.; Zhou, W.; Liu, G.; Jiang, H.; Jiang, Q. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res., 2019, 47,(D1), D140-D144.
[http://dx.doi.org/10.1093/nar/gky1051 ] [PMID: 30380072]
[15]
Cheng, L.; Hu, Y. Human Disease System Biology. Curr. Gene Ther., 2018, 18,(5), 255-256.
[http://dx.doi.org/10.2174/1566523218666181010101114 ] [PMID: 30306867]
[16]
Rashid, M.; Saha, S.; Raghava, G.P. Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics, 2007, 8,, 337.
[http://dx.doi.org/10.1186/1471-2105-8-337 ] [PMID: 17854501]
[17]
Lin, H.; Ding, H.; Guo, F.B.; Zhang, A.Y.; Huang, J. Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Pept. Lett., 2008, 15,(7), 739-744.
[http://dx.doi.org/10.2174/092986608785133681 ] [PMID: 18782071]
[18]
Lin, H.; Ding, H.; Guo, F.B.; Huang, J. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol. Divers., 2010, 14,(4), 667-671.
[http://dx.doi.org/10.1007/s11030-009-9205-1 ] [PMID: 19908156]
[19]
Tang, S.N.; Sun, J.M.; Xiong, W.W.; Cong, P.S.; Li, T.H. Identification of the subcellular localization of mycobacterial proteins using localization motifs. Biochimie, 2012, 94,(3), 847-853.
[http://dx.doi.org/10.1016/j.biochi.2011.12.003 ] [PMID: 22182488]
[20]
Fan, G.L.; Li, Q.Z. Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition. J. Theor. Biol., 2012, 304,, 88-95.
[http://dx.doi.org/10.1016/j.jtbi.2012.03.017 ] [PMID: 22459701]
[21]
Zhu, P.P.; Li, W.C.; Zhong, Z.J.; Deng, E.Z.; Ding, H.; Chen, W.; Lin, H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol. Biosyst., 2015, 11,(2), 558-563.
[http://dx.doi.org/10.1039/C4MB00645C ] [PMID: 25437899]
[22]
Khan, M.; Hayat, M.; Khan, S.A.; Iqbal, N. Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou’s general PseAAC. J. Theor. Biol., 2017, 415,, 13-19.
[http://dx.doi.org/10.1016/j.jtbi.2016.12.004 ] [PMID: 27939596]
[23]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27,(8), 1226-1238.
[http://dx.doi.org/10.1109/TPAMI.2005.159 ] [PMID: 16119262]
[24]
Khan, M.; Hayat, M.; Khan, S.A.; Ahmad, S.; Iqbal, N. Bi-PSSM: Position specific scoring matrix based intelligent computational model for identification of mycobacterial membrane proteins. J. Theor. Biol., 2017, 435,, 116-124.
[http://dx.doi.org/10.1016/j.jtbi.2017.09.013 ] [PMID: 28927812]
[25]
Cui, T.; Zhang, L.; Huang, Y.; Yi, Y.; Tan, P.; Zhao, Y.; Hu, Y.; Xu, L.; Li, E.; Wang, D. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res., 2018, 46,(D1), D371-D374.
[PMID: 29106639]
[26]
Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; Li, C.; Qian, K.; Zhang, C.; Huang, Y.; Li, K.; Lin, H.; Wang, D. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res., 2017, 45,(D1), D135-D138.
[PMID: 27543076]
[27]
Yang, J.; Chen, X.; McDermaid, A.; Ma, Q. DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses. Bioinformatics, 2017, 33,(16), 2586-2588.
[http://dx.doi.org/10.1093/bioinformatics/btx223 ] [PMID: 28419194]
[28]
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics, 2017, 33,(3), 467-469.
[PMID: 28171531]
[29]
UniProt: the universal protein knowledgebase. Nucleic Acids Res., 2017, 45,(D1), D158-D169.
[http://dx.doi.org/10.1093/nar/gkw1099 ] [PMID: 27899622]
[30]
Burley, S.K.; Berman, H.M.; Kleywegt, G.J.; Markley, J.L.; Nakamura, H.; Velankar, S. Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. Methods Mol. Biol., 2017, 1607,, 627-641.
[http://dx.doi.org/10.1007/978-1-4939-7000-1_26 ] [PMID: 28573592]
[31]
Coordinators, N.R. Nucleic Acids Res., 2017, 45,(D1), D12-D17.
[http://dx.doi.org/10.1093/nar/gkw1071 ] [PMID: 27899561]
[32]
Li, W.; Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22,(13), 1658-1659.
[http://dx.doi.org/10.1093/bioinformatics/btl158 ] [PMID: 16731699]
[33]
Wang, G.; Dunbrack, R.L., Jr PISCES: a protein sequence culling server. Bioinformatics, 2003, 19,(12), 1589-1591.
[http://dx.doi.org/10.1093/bioinformatics/btg224 ] [PMID: 12912846]
[34]
Zou, Q.; Lin, G.; Jiang, X.; Liu, X.; Zeng, X. Sequence clustering in bioinformatics: an empirical study. Brief. Bioinform.,, 2018,. Online ahead of print
[http://dx.doi.org/10.1093/bib/bby090]
[35]
Wu, C.H.; Apweiler, R.; Bairoch, A.; Natale, D.A.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; Martin, M.J.; Mazumder, R.; O’Donovan, C.; Redaschi, N.; Suzek, B. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res., 2006, 34,(Database issue), D187-D191.
[http://dx.doi.org/10.1093/nar/gkj161 ] [PMID: 16381842]
[36]
Nair, R.; Rost, B. Sequence conserved for subcellular localization. Protein Sci., 2002, 11,(12), 2836-2847.
[http://dx.doi.org/10.1110/ps.0207402 ] [PMID: 12441382]
[37]
Yu, C.S.; Chen, Y.C.; Lu, C.H.; Hwang, J.K. Prediction of protein subcellular localization. Proteins, 2006, 64,(3), 643-651.
[http://dx.doi.org/10.1002/prot.21018 ] [PMID: 16752418]
[38]
Gupta, M.K.; Subramanian, V.; Yadav, J.S. Immunoproteomic identification of secretory and subcellular protein antigens and functional evaluation of the secretome fraction of Mycobacterium immunogenum, a newly recognized species of the Mycobacterium chelonae-Mycobacterium abscessus group. J. Proteome Res., 2009, 8,(5), 2319-30.
[39]
Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.C. repRNA: a web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics, 2016, 291,(1), 473-481.
[http://dx.doi.org/10.1007/s00438-015-1078-7 ] [PMID: 26085220]
[40]
Yang, H.; Qiu, W.R.; Liu, G.; Guo, F.B.; Chen, W.; Chou, K.C.; Lin, H. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int. J. Biol. Sci., 2018, 14,(8), 883-891.
[http://dx.doi.org/10.7150/ijbs.24616 ] [PMID: 29989083]
[41]
Tang, H. A two-step discriminated method to identify thermophilic proteins. Int. J. Biomath., 2017, 10,(4) 1750050
[http://dx.doi.org/10.1142/S1793524517500504]
[42]
Zhang, J.; Liu, B. A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods. Curr. Bioinform., 2019, 14,(3), 190-199.
[http://dx.doi.org/10.2174/1574893614666181212102749]
[43]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273,(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024 ] [PMID: 21168420]
[44]
Yang, W. A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform., 2019, 14,, 234-240.
[http://dx.doi.org/10.2174/1574893613666181113131415]
[45]
Andrade, M.A.; O’Donoghue, S.I.; Rost, B. Adaptation of protein surfaces to subcellular location. J. Mol. Biol., 1998, 276,(2), 517-525.
[http://dx.doi.org/10.1006/jmbi.1997.1498] [PMID: 9512720]
[46]
Cao, R.; Cheng, J. Protein single-model quality assessment by feature-based probability density functions. Sci. Rep., 2016, 6,, 23990.
[http://dx.doi.org/10.1038/srep23990 ] [PMID: 27041353]
[47]
Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules, 2017, 22,(10) E1732
[http://dx.doi.org/10.3390/molecules22101732 ] [PMID: 29039790]
[48]
Ding, H.; Deng, E.Z.; Yuan, L.F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.C. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res. Int., 2014, 2014, 286419
[http://dx.doi.org/10.1155/2014/286419 ] [PMID: 24991545]
[49]
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med., 2013, 2013, 567529
[http://dx.doi.org/10.1155/2013/567529 ] [PMID: 24062796]
[50]
Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med., 2013, 2013, 530696
[http://dx.doi.org/10.1155/2013/530696 ] [PMID: 23762187]
[51]
Anishetty, S.; Pennathur, G.; Anishetty, R. Tripeptide analysis of protein structures. BMC Struct. Biol., 2002, 2,, 9.
[http://dx.doi.org/10.1186/1472-6807-2-9 ] [PMID: 12495440]
[52]
Ung, P.; Winkler, D.A. Tripeptide motifs in biology: targets for peptidomimetic design. J. Med. Chem., 2011, 54,(5), 1111-1125.
[http://dx.doi.org/10.1021/jm1012984 ] [PMID: 21275407]
[53]
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 2001, 43,(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035 ] [PMID: 11288174]
[54]
Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res., 2015, 43,(W1) W65-71
[http://dx.doi.org/10.1093/nar/gkv458 ] [PMID: 25958395]
[55]
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12,(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B ] [PMID: 26883492]
[56]
Rahman, M.S.; Shatabda, S.; Saha, S.; Kaykobad, M.; Rahman, M.S. DPP-PseAAC: A DNA-binding protein prediction model using Chou’s general PseAAC. J. Theor. Biol., 2018, 452,, 22-34.
[http://dx.doi.org/10.1016/j.jtbi.2018.05.006 ] [PMID: 29753757]
[57]
Feng, P. iDNA6mA-PseKNC: Identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111,(1), 96-102.
[58]
Guo, S.H.; Deng, E.Z.; Xu, L.Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.C. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics, 2014, 30,(11), 1522-1529.
[http://dx.doi.org/10.1093/bioinformatics/btu083 ] [PMID: 24504871]
[59]
Xiong, W.; Li, T.; Chen, K.; Tang, K. Local combinational variables: an approach used in DNA-binding helix-turn-helix motif prediction with sequence information. Nucleic Acids Res., 2009, 37,(17), 5632-5640.
[http://dx.doi.org/10.1093/nar/gkp628 ] [PMID: 19651875]
[60]
Schwartz, D.; Gygi, S.P. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol., 2005, 23,(11), 1391-1398.
[http://dx.doi.org/10.1038/nbt1146 ] [PMID: 16273072]
[61]
Russell, R.B.; Saqi, M.A.; Sayle, R.A.; Bates, P.A.; Sternberg, M.J. Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J. Mol. Biol., 1997, 269,(3), 423-439.
[http://dx.doi.org/10.1006/jmbi.1997.1019] [PMID: 9199410]
[62]
Pánek, J.; Eidhammer, I.; Aasland, R. A new method for identification of protein (sub)families in a set of proteins based on hydropathy distribution in proteins. Proteins, 2005, 58,(4), 923-934.
[http://dx.doi.org/10.1002/prot.20356 ] [PMID: 15645428]
[63]
González-Díaz, H.; González-Díaz, Y.; Santana, L.; Ubeira, F.M.; Uriarte, E. Proteomics, networks and connectivity indices. Proteomics, 2008, 8,(4), 750-778.
[http://dx.doi.org/10.1002/pmic.200700638 ] [PMID: 18297652]
[64]
Agüero-Chapin, G.; González-Díaz, H.; Molina, R.; Varona-Santos, J.; Uriarte, E.; González-Díaz, Y. Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. FEBS Lett., 2006, 580,(3), 723-730.
[http://dx.doi.org/10.1016/j.febslet.2005.12.072 ] [PMID: 16413021]
[65]
Chen, Y.L.; Li, Q.Z. Prediction of the subcellular location of apoptosis proteins. J. Theor. Biol., 2007, 245,(4), 775-783.
[http://dx.doi.org/10.1016/j.jtbi.2006.11.010 ] [PMID: 17189644]
[66]
Chen, Y.L.; Li, Q.Z. Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J. Theor. Biol., 2007, 248,(2), 377-381.
[http://dx.doi.org/10.1016/j.jtbi.2007.05.019 ] [PMID: 17572445]
[67]
Schäffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res., 2001, 29,(14), 2994-3005.
[http://dx.doi.org/10.1093/nar/29.14.2994 ] [PMID: 11452024]
[68]
Hou, J.; Wu, T.; Cao, R.; Cheng, J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins, 2009, 87,(12), 1165-1178.
[http://dx.doi.org/10.1002/prot.25697 ] [PMID: 30985027]
[69]
Jones, D.T. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics, 2007, 23,(5), 538-544.
[http://dx.doi.org/10.1093/bioinformatics/btl677 ] [PMID: 17237066]
[70]
Biswas, A.K.; Noman, N.; Sikder, A.R. Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinformatics, 2010, 11,, 273.
[http://dx.doi.org/10.1186/1471-2105-11-273 ] [PMID: 20492656]
[71]
Verma, R.; Varshney, G.C.; Raghava, G.P. Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids, 2010, 39,(1), 101-110.
[http://dx.doi.org/10.1007/s00726-009-0381-1 ] [PMID: 19908123]
[72]
Wei, L.; Tang, J.; Zou, Q. Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci., 2017, 384,, 135-144.
[http://dx.doi.org/10.1016/j.ins.2016.06.026]
[73]
Sibley, A.B.; Cosman, M.; Krishnan, V.V. An empirical correlation between secondary structure content and averaged chemical shifts in proteins. Biophys. J., 2003, 84,(2 Pt 1), 1223-1227.
[http://dx.doi.org/10.1016/S0006-3495(03)74937-6 ] [PMID: 12547802]
[74]
Zhao, Y.; Alipanahi, B.; Li, S.C.; Li, M. Protein secondary structure prediction using NMR chemical shift data. J. Bioinform. Comput. Biol., 2010, 8,(5), 867-884.
[http://dx.doi.org/10.1142/S0219720010004987 ] [PMID: 20981892]
[75]
Fan, G.L.; Li, Q.Z. Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids, 2012, 43,(2), 545-555.
[http://dx.doi.org/10.1007/s00726-011-1143-4 ] [PMID: 22102053]
[76]
Zhu, X.J. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163,, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[77]
Seavey, B.R.; Farr, E.A.; Westler, W.M.; Markley, J.L. A relational database for sequence-specific protein NMR data. J. Biomol. NMR, 1991, 1,(3), 217-236.
[http://dx.doi.org/10.1007/BF01875516] [PMID: 1841696]
[78]
Pollastri, G.; McLysaght, A. Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics, 2005, 21,(8), 1719-1720.
[http://dx.doi.org/10.1093/bioinformatics/bti203 ] [PMID: 15585524]
[79]
Pollastri, G.; Martin, A.J.; Mooney, C.; Vullo, A. Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics, 2007, 8,, 201.
[http://dx.doi.org/10.1186/1471-2105-8-201 ] [PMID: 17570843]
[80]
Liu, B.; Chen, J.; Wang, X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol. Genet. Genomics, 2015, 290,(5), 1919-1931.
[http://dx.doi.org/10.1007/s00438-015-1044-4 ] [PMID: 25896721]
[81]
Feng, P.; Lin, H.; Chen, W.; Zuo, Y. Predicting the types of J-proteins using clustered amino acids. BioMed Res. Int., 2014, 2014, 935719
[http://dx.doi.org/10.1155/2014/935719 ] [PMID: 24804260]
[82]
Zou, Q.; Wan, S.; Ju, Y.; Tang, J.; Zeng, X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst. Biol., 2016, 10,(4)(Suppl. 4), 114.
[http://dx.doi.org/10.1186/s12918-016-0353-5 ] [PMID: 28155714]
[83]
Zou, Q. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing, 2016, 173,, 346-354.
[http://dx.doi.org/10.1016/j.neucom.2014.12.123]
[84]
Rocchi, L.; Chiari, L.; Cappello, A. Feature selection of stabilometric parameters based on principal component analysis. Med. Biol. Eng. Comput., 2004, 42,(1), 71-79.
[http://dx.doi.org/10.1007/BF02351013 ] [PMID: 14977225]
[85]
Lin, H.; Ding, H. Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J. Theor. Biol., 2011, 269,(1), 64-69.
[http://dx.doi.org/10.1016/j.jtbi.2010.10.019 ] [PMID: 20969879]
[86]
Tan, J.X.; Li, S.H.; Zhang, Z.M.; Chen, C.X.; Chen, W.; Tang, H.; Lin, H. Identification of hormone binding proteins based on machine learning methods. Math. Biosci. Eng., 2019, 16,(4), 2466-2480.
[http://dx.doi.org/10.3934/mbe.2019123 ] [PMID: 31137222]
[87]
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition. BioMed Res. Int., 2016, 2016, 5413903
[http://dx.doi.org/10.1155/2016/5413903 ] [PMID: 27597968]
[88]
Zhao, Y.W.; Lai, H.Y.; Tang, H.; Chen, W.; Lin, H. Prediction of phosphothreonine sites in human proteins by fusing different features. Sci. Rep., 2016, 6,, 34817.
[http://dx.doi.org/10.1038/srep34817 ] [PMID: 27698459]
[89]
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition. BioMed Res. Int., 2016, 2016, 1654623
[http://dx.doi.org/10.1155/2016/1654623 ] [PMID: 27437396]
[90]
Chen, W.; Lv, H.; Nie, F.; Lin, H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics, 2019, 35,(16), 2796-2800.
[http://dx.doi.org/10.1093/bioinformatics/btz015 ] [PMID: 30624619]
[91]
Qu, K.Y.; Wei, L.Y.; Zou, Q. A Review of DNA-binding Proteins Prediction Methods. Curr. Bioinform., 2019, 14,(3), 246-254.
[http://dx.doi.org/10.2174/1574893614666181212102030]
[92]
Dao, F.Y. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics, 2019, 35,(12), 2075-2083.
[PMID: 30428009]
[93]
Feng, C.Q. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics, 2019, 35,(9), 1469-1477.
[PMID: 30247625]
[94]
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: a tool to identify growth hormone-binding proteins. Int. J. Biol. Sci., 2018, 14,(8), 957-964.
[http://dx.doi.org/10.7150/ijbs.24174 ] [PMID: 29989085]
[95]
Li, N.; Kang, J.; Jiang, L.; He, B.; Lin, H.; Huang, J. PSBinder: A Web Service for Predicting Polystyrene Surface-Binding Peptides. BioMed Res. Int., 2017, 2017, 5761517
[http://dx.doi.org/10.1155/2017/5761517 ] [PMID: 29445741]
[96]
Feng, P-M.; Chen, W.; Lin, H.; Chou, K.C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem., 2013, 442,(1), 118-125.
[http://dx.doi.org/10.1016/j.ab.2013.05.024 ] [PMID: 23756733]
[97]
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics, 2017, 33,(22), 3518-3523.
[http://dx.doi.org/10.1093/bioinformatics/btx479 ] [PMID: 28961687]
[98]
Li, D.; Ju, Y.; Zou, Q. Protein Folds Prediction with Hierarchical Structured SVM. Curr. Proteomics, 2016, 13,(2), 79-85.
[http://dx.doi.org/10.2174/157016461302160514000940]
[99]
Bu, H.D. Predicting Enhancers from Multiple Cell Lines and Tissues across Different Developmental Stages Based On SVM Method. Curr. Bioinform., 2018, 13,(6), 655-660.
[http://dx.doi.org/10.2174/1574893613666180726163429]
[100]
Zhang, N. Discriminating Ramos and Jurkat Cells with Image Textures from Diffraction Imaging Flow Cytometry Based on a Support Vector Machine. Curr. Bioinform., 2018, 13,(1), 50-56.
[http://dx.doi.org/10.2174/1574893611666160608102537]
[101]
Stephenson, N.; Shane, E.; Chase, J.; Rowland, J.; Ries, D.; Justice, N.; Zhang, J.; Chan, L.; Cao, R. Survey of machine learning techniques in drug discovery. Curr. Drug Metab., 2019, 20,(3), 185-193.
[102]
Cao, R.; Wang, Z.; Wang, Y.; Cheng, J. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics, 2014, 15,, 120.
[http://dx.doi.org/10.1186/1471-2105-15-120 ] [PMID: 24776231]
[103]
Chang, C.C.; Hsu, C.W.; Lin, C.J. The analysis of decomposition methods for support vector machines. IEEE Trans. Neural Netw., 2000, 11,(4), 1003-1008.
[http://dx.doi.org/10.1109/72.857780 ] [PMID: 18249827]
[104]
Pedrycz, W. Advances in Kernel Methods.Support Vector Learning. Scholkopf, B.; Burges, C.J.C.; Smola, A.J., Eds.; MIT Press:Cambridge, 1999, , pp 376+vii.. Neurocomputing,, 2002,, 47, 303-304.
[105]
Chen, W.; Feng, P.M.; Lin, H.; Chou, K.C. iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res. Int., 2014, 2014, 623149
[http://dx.doi.org/10.1155/2014/623149 ] [PMID: 24967386]
[106]
Chen, W.; Feng, P.M.; Deng, E.Z.; Lin, H.; Chou, K.C. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem., 2014, 462,, 76-83.
[http://dx.doi.org/10.1016/j.ab.2014.06.022 ] [PMID: 25016190]
[107]
Bailey, T.L.; Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol., 1994, 2,, 28-36.
[PMID: 7584402]
[108]
Bailey, T.L.; Gribskov, M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics, 1998, 14,(1), 48-54.
[http://dx.doi.org/10.1093/bioinformatics/14.1.48] [PMID: 9520501]
[109]
Eddy, S.R. Profile hidden Markov models. Bioinformatics, 1998, 14,(9), 755-763.
[http://dx.doi.org/10.1093/bioinformatics/14.9.755] [PMID: 9918945]
[110]
Wheeler, T.J.; Eddy, S.R. nhmmer: DNA homology search with profile HMMs. Bioinformatics, 2013, 29,(19), 2487-2489.
[http://dx.doi.org/10.1093/bioinformatics/btt403 ] [PMID: 23842809]
[111]
Chai, G. HMMCAS: a web tool for the identification and domain annotations of CAS protein. IEEE/ACM Trans. Comput. Biol. Bioinform, 2019, 16,(4), 1313-1315.
[112]
Krogh, A.; Brown, M.; Mian, I.S.; Sjölander, K.; Haussler, D. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol., 1994, 235,(5), 1501-1531.
[http://dx.doi.org/10.1006/jmbi.1994.1104] [PMID: 8107089]
[113]
Lin, H. The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J. Theor. Biol., 2008, 252,(2), 350-356.
[http://dx.doi.org/10.1016/j.jtbi.2008.02.004 ] [PMID: 18355838]
[114]
Lin, H.; Li, Q.Z. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J. Comput. Chem., 2007, 28,(9), 1463-1466.
[http://dx.doi.org/10.1002/jcc.20554 ] [PMID: 17330882]
[115]
Manavalan, B.; Subramaniyam, S.; Shin, T.H.; Kim, M.O.; Lee, G. Machine-Learning-Based Prediction of Cell-Penetrating Peptides and Their Uptake Efficiency with Improved Accuracy. J. Proteome Res., 2018, 17,(8), 2715-2726.
[http://dx.doi.org/10.1021/acs.jproteome.8b00148 ] [PMID: 29893128]
[116]
Chou, K.C.; Zhang, C.T. Prediction of protein structural classes. Crit. Rev. Biochem. Mol. Biol., 1995, 30,(4), 275-349.
[http://dx.doi.org/10.3109/10409239509083488] [PMID: 7587280]
[117]
Lai, H.Y.; Chen, X.X.; Chen, W.; Tang, H.; Lin, H. Sequence based predictive modeling to identify cancerlectins. Oncotarget, 2017, 8,(17), 28169-28175.
[http://dx.doi.org/10.18632/oncotarget.15963 ] [PMID: 28423655]
[118]
Chen, W.; Feng, P.; Liu, T.; Jin, D. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab., 2019, 20,(3), 224-228.
[PMID: 30378494]
[119]
Lv, H.; Zhang, Z.M.; Li, S.H.; Tan, J.X.; Chen, W.; Lin, H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief. Bioinform., 2020, 21,(3), 982-995.
[PMID: 31157855]
[120]
Chou, K.C.; Shen, H.B. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc., 2008, 3,(2), 153-162.
[http://dx.doi.org/10.1038/nprot.2007.494 ] [PMID: 18274516]
[121]
Chou, K.C.; Shen, H.B. Recent progress in protein subcellular location prediction. Anal. Biochem., 2007, 370,(1), 1-16.
[http://dx.doi.org/10.1016/j.ab.2007.07.006 ] [PMID: 17698024]
[122]
Xu, Z.C.; Feng, P.M.; Yang, H.; Qiu, W.R.; Chen, W.; Lin, H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics, 2019, 35,(23), 4922-4929.
[http://dx.doi.org/10.1093/bioinformatics/btz358 ] [PMID: 31077296]
[123]
Gao, H.T.; Li, T.H.; Chen, K.; Li, W.G.; Bi, X. Overlapping spectra resolution using non-negative matrix factorization. Talanta, 2005, 66,(1), 65-73.
[http://dx.doi.org/10.1016/j.talanta.2004.09.017 ] [PMID: 18969963]
[124]
Liu, Z.; Xiao, X.; Qiu, W.R.; Chou, K.C. iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal. Biochem., 2015, 474,, 69-77.
[http://dx.doi.org/10.1016/j.ab.2014.12.009 ] [PMID: 25596338]
[125]
Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.C. iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets. Molecules, 2016, 21,(1) E95
[http://dx.doi.org/10.3390/molecules21010095 ] [PMID: 26797600]
[126]
Wan, S.; Duan, Y.; Zou, Q. HPSLPred: An Ensemble Multi-Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source. Proteomics, 2017, 17,(17-18) 1700262
[http://dx.doi.org/10.1002/pmic.201700262 ] [PMID: 28776938]
[127]
Jia, J.; Liu, Z.; Xiao, X.; Liu, B.; Chou, K.C. iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal. Biochem., 2016, 497,, 48-56.
[http://dx.doi.org/10.1016/j.ab.2015.12.009 ] [PMID: 26723495]
[128]
Chen, W.; Ding, H.; Zhou, X.; Lin, H.; Chou, K.C. iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. Anal. Biochem., 2018, 561-562,, 59-65.
[http://dx.doi.org/10.1016/j.ab.2018.09.002 ] [PMID: 30201554]
[129]
Xiao, X.; Min, J.L.; Lin, W.Z.; Liu, Z.; Cheng, X.; Chou, K.C. iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. J. Biomol. Struct. Dyn., 2015, 33,(10), 2221-2233.
[http://dx.doi.org/10.1080/07391102.2014.998710 ] [PMID: 25513722]
[130]
Wu, Y.; Zheng, Y.; Tang, H. Identifying the Types of Ion Channel-Targeted Conotoxins by Incorporating New Properties of Residues into Pseudo Amino Acid Composition. BioMed Res. Int., 2016, 2016,(4-5) 3981478
[http://dx.doi.org/10.1155/2016/3981478 ] [PMID: 27631006]
[131]
Xu, Y. et al IEEE 2008 IEEE International Symposium on IT in Medicine and Education (ITME) - Xiamen, China (2008.12.12-2008.12.14) 2008 IEEE International Symposium on IT in Medicine and Education-F-score feature selection method may improve texture-based liver seg., 2008.
[132]
Ding, H.; Li, D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids, 2015, 47,(2), 329-333.
[http://dx.doi.org/10.1007/s00726-014-1862-4 ] [PMID: 25385313]
[133]
Zhang, Z.; Zhao, Y.; Liao, X.; Shi, W.; Li, K.; Zou, Q.; Peng, S. Deep learning in omics: a survey and guideline. Brief. Funct. Genomics, 2019, 18,(1), 41-57.
[http://dx.doi.org/10.1093/bfgp/ely030]
[134]
Long, H.X.; Wang, M.; Fu, H.Y. Deep Convolutional Neural Networks for Predicting Hydroxyproline in Proteins. Curr. Bioinform., 2017, 12,(3), 233-238.
[http://dx.doi.org/10.2174/1574893612666170221152848]
[135]
Wei, L. Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing, 2019, 324,, 3-9.
[http://dx.doi.org/10.1016/j.neucom.2018.04.082]
[136]
Li, Y.; Niu, M.; Zou, Q. ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. J. Proteome Res., 2019, 18,(3), 1392-1401.
[http://dx.doi.org/10.1021/acs.jproteome.9b00012 ] [PMID: 30698979]
[137]
Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods, 2019, 166,, 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009]
[138]
Cheng, L.; Jiang, Y.; Ju, H.; Sun, J.; Peng, J.; Zhou, M.; Hu, Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19,(Suppl. 1), 919.
[http://dx.doi.org/10.1186/s12864-017-4338-6 ] [PMID: 29363423]
[139]
Cheng, L.; Hu, Y.; Sun, J.; Zhou, M.; Jiang, Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics, 2018, 34,(11), 1953-1956.
[http://dx.doi.org/10.1093/bioinformatics/bty002 ] [PMID: 29365045]
[140]
Hu, Y.; Zhou, M.; Shi, H.; Ju, H.; Jiang, Q.; Cheng, L. Measuring disease similarity and predicting disease-related ncRNAs by a novel method. BMC Med. Genomics, 2017, 10,(5)(Suppl. 5), 71.
[http://dx.doi.org/10.1186/s12920-017-0315-9 ] [PMID: 29297338]
[141]
Zou, Q.; Guo, J.; Ju, Y.; Wu, M.; Zeng, X.; Hong, Z. Improving tRNAscan-SE Annotation Results via Ensemble Classifiers. Mol. Inform., 2015, 34,(11-12), 761-770.
[http://dx.doi.org/10.1002/minf.201500031 ] [PMID: 27491037]
[142]
Lin, C. LibD3C: Ensemble classifiers with a clustering and dynamic selection strategy. Neurocomputing, 2014, 123,, 424-435.
[http://dx.doi.org/10.1016/j.neucom.2013.08.004]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 16
ISSUE: 5
Year: 2020
Published on: 07 August, 2020
Page: [605 - 619]
Pages: 15
DOI: 10.2174/1573406415666191004101913
Price: $65

Article Metrics

PDF: 22
HTML: 3