A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite

Author(s): Ting Liu, Hua Tang*

Journal Name: Current Pharmaceutical Design

Volume 26 , Issue 26 , 2020

Become EABM
Become Reviewer
Call for Editor


The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.

Keywords: Mitochondria proteins, malaria parasite, machine learning, database, feature, infection.

Vaidya AB, Mather MW. Mitochondrial evolution and functions in malaria parasites. Annu Rev Microbiol 2009; 63: 249-67.
[http://dx.doi.org/10.1146/annurev.micro.091208.073424] [PMID: 19575561]
Hikosaka K, Komatsuya K, Suzuki S, Kita K. Mitochondria of Malaria Parasites as a Drug Target. An Overview of Tropical Diseases 2015;; 17-38.
Mather MW, Vaidya AB. Mitochondria in malaria and related parasites: ancient, diverse and streamlined. J Bioenerg Biomembr 2008; 40(5): 425-33.
[http://dx.doi.org/10.1007/s10863-008-9176-4] [PMID: 18814021]
Bender A, van Dooren GG, Ralph SA, McFadden GI, Schneider G. Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 2003; 132(2): 59-66.
[http://dx.doi.org/10.1016/j.molbiopara.2003.07.001] [PMID: 14599665]
Verma R, Varshney GC, Raghava GP. Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids 2010; 39(1): 101-10.
[http://dx.doi.org/10.1007/s00726-009-0381-1] [PMID: 19908123]
Zuo YC, Peng Y, Liu L, Chen W, Yang L, Fan GL. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal Biochem 2014; 458: 14-9.
[http://dx.doi.org/10.1016/j.ab.2014.04.032] [PMID: 24802134]
Afridi TH, Khan A, Lee YS. Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition. Amino Acids 2012; 42(4): 1443-54.
[http://dx.doi.org/10.1007/s00726-011-0888-0] [PMID: 21445589]
Jia C, Liu T, Chang AK, Zhai Y. Prediction of mitochondrial proteins of malaria parasite using bi-profile Bayes feature extraction. Biochimie 2011; 93(4): 778-82.
[http://dx.doi.org/10.1016/j.biochi.2011.01.013] [PMID: 21281691]
Zuo YC, Li QZ. Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides 2009; 30(10): 1788-93.
[http://dx.doi.org/10.1016/j.peptides.2009.06.032] [PMID: 19591890]
Zuo Y, Lv Y, Wei Z, Yang L, Li G, Fan G. iDPF-PseRAAAC: A web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One 2015; 10(12)e0145541
[http://dx.doi.org/10.1371/journal.pone.0145541] [PMID: 26713618]
Chen YL, Li QZ, Zhang LQ. Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet. Amino Acids 2012; 42(4): 1309-16.
[http://dx.doi.org/10.1007/s00726-010-0825-7] [PMID: 21191803]
Mirza MT, Khan A, Tahir M, Lee YS. MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification. Comput Biol Med 2013; 43(10): 1502-11.
[http://dx.doi.org/10.1016/j.compbiomed.2013.07.024] [PMID: 24034742]
Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids 2015; 47(2): 329-33.
[http://dx.doi.org/10.1007/s00726-014-1862-4] [PMID: 25385313]
Feng YG, Xie WX. Identification of mitochondrial proteins of malaria parasite adding the new parameter. Lett Org Chem 2019; 16: 258-62.
Cui T, Zhang L, Huang Y, et al. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res 2018; 46(D1): D371-4.
[PMID: 29106639]
Zhang T, Tan P, Wang L, et al. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res 2017; 45(D1): D135-8.
[PMID: 27543076]
Yi Y, Zhao Y, Li C, et al. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res 2017; 45(D1): D115-8.
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID: 27899615]
Feng P, Ding H, Lin H, Chen W. AOD: the antioxidant protein database. Sci Rep 2017; 7(1): 7449.
[http://dx.doi.org/10.1038/s41598-017-08115-6] [PMID: 28784999]
Tang H, Zou P, Zhang C, Chen R, Chen W, Lin H. Identification of apolipoprotein using feature selection technique. Sci Rep 2016; 6: 30441.
[http://dx.doi.org/10.1038/srep30441] [PMID: 27443605]
Liang ZY, Lai HY, Yang H, et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics 2017; 33(3): 467-9.
[PMID: 28171531]
Cheng L, Wang P, Tian R, et al. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res 2019; 47(D1): D140-4.
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072]
Cheng L, Yang H, Zhao H, et al. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform 2019; 20(1): 203-9.
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812]
Deng L, Wang J, Zhang J. Predicting gene ontology function of human micrornas by integrating multiple networks. Front Genet 2019; 10: 3.
[http://dx.doi.org/10.3389/fgene.2019.00003] [PMID: 30761178]
Hu B, Zheng L, Long C, et al. EmExplorer: a database for exploring time activation of gene expression in mammalian embryos. Open Biol 2019; 9(6)190054
[http://dx.doi.org/10.1098/rsob.190054] [PMID: 31164042]
Long CS, Li W, Liang PF, Liu S, Zuo YC. Transcriptome comparisons of multi-species identify differential genome activation of mammals embryogenesis. IEEE Access 2019;; 7: 7794-802.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22(13): 1658-9.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID: 16731699]
Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 2010; 26(5): 680-2.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
Zou Q, Lin G, Jiang X, Liu X, Zeng X. Sequence clustering in bioinformatics: an empirical study. Brief Bioinform 2018.. Online ahead of print.
[http://dx.doi.org/10.1093/bib/bby090] [PMID: 30239587]
Chou KC, Zhang CT. Predicting protein folding types by distance functions that make allowances for amino acid interactions. J Biol Chem 1994; 269(35): 22014-20.
[PMID: 8071322]
Chou KC. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 1995; 21(4): 319-44.
[http://dx.doi.org/10.1002/prot.340210406] [PMID: 7567954]
Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 2011; 271(1): 10-7.
[http://dx.doi.org/10.1016/j.jtbi.2010.11.017] [PMID: 21110985]
Chou KC, Shen HB. Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 2006; 347(1): 150-7.
[http://dx.doi.org/10.1016/j.bbrc.2006.06.059] [PMID: 16808903]
Chauhan JS, Mishra NK, Raghava GP. Identification of ATP binding residues of a protein from its primary sequence. BMC Bioinformatics 2009; 10: 434.
[http://dx.doi.org/10.1186/1471-2105-10-434] [PMID: 20021687]
Wang X, Mi G, Wang C, et al. Prediction of flavin mono-nucleotide binding sites using modified PSSM profile and ensemble support vector machine. Comput Biol Med 2012; 42(11): 1053-9.
[http://dx.doi.org/10.1016/j.compbiomed.2012.08.005] [PMID: 22985817]
Wang CC, Fang Y, Xiao J, Li M. Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids 2011; 40(1): 239-48.
[http://dx.doi.org/10.1007/s00726-010-0639-7] [PMID: 20549269]
Guang X, Guo Y, Xiao J, et al. Predicting the state of cysteines based on sequence information. J Theor Biol 2010; 267(3): 312-8.
[http://dx.doi.org/10.1016/j.jtbi.2010.09.002] [PMID: 20826168]
Xiong W, Guo Y, Li M. Prediction of lipid-binding sites based on support vector machine and position specific scoring matrix. Protein J 2010; 29(6): 427-31.
[http://dx.doi.org/10.1007/s10930-010-9269-x] [PMID: 20658312]
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins-Structure Function Genetics 2001;; 44( 60)
Naveed M, Khan A. GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble. Amino Acids 2012; 42(5): 1809-23.
[http://dx.doi.org/10.1007/s00726-011-0902-6] [PMID: 21505826]
Fan GL, Li QZ. Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 2012; 43(2): 545-55.
[http://dx.doi.org/10.1007/s00726-011-1143-4] [PMID: 22102053]
Ur-Rehman Z, Khan A. G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. Anal Biochem 2011; 412(2): 173-82.
[http://dx.doi.org/10.1016/j.ab.2011.01.040] [PMID: 21295004]
Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol Biosyst 2016; 12(4): 1269-75.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
Zuo Y, Li Y, Chen Y, Li G, Yan Z, Yang L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics 2017; 33(1): 122-4.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinformatics 2012; 9(2): 467-75.
[http://dx.doi.org/10.1109/TCBB.2011.117] [PMID: 21860064]
Mohabatkar H, Beigi MM, Abdolahi K, Mohsenzadeh S. Prediction of allergenic proteins by means of the concept of Chou’s pseudo amino acid composition and a machine learning approach. Med Chem 2013; 9(1): 133-7.
[http://dx.doi.org/10.2174/157340613804488341] [PMID: 22931491]
Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 2014; 341: 34-40.
[http://dx.doi.org/10.1016/j.jtbi.2013.08.037] [PMID: 24035842]
Khosravian M, Faramarzi FK, Beigi MM, Behbahani M, Mohabatkar H. Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods. Protein Pept Lett 2013; 20(2): 180-6.
[http://dx.doi.org/10.2174/092986613804725307] [PMID: 22894156]
Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 2010; 263(2): 203-9.
[http://dx.doi.org/10.1016/j.jtbi.2009.11.016] [PMID: 19961864]
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med 2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med 2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
Yang H, Tang H, Chen XX, et al. Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res Int 2016; 20165413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
Chen XX, Tang H, Li WC, et al. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res Int 2016; 20161654623
[http://dx.doi.org/10.1155/2016/1654623] [PMID: 27437396]
Pace CN, Fu H, Fryar KL, et al. Contribution of hydrophobic interactions to protein stability. J Mol Biol 2011; 408(3): 514-28.
[http://dx.doi.org/10.1016/j.jmb.2011.02.053] [PMID: 21377472]
Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 1981; 78(6): 3824-8.
[http://dx.doi.org/10.1073/pnas.78.6.3824] [PMID: 6167991]
Hofmann HJ, Hädge D. On the theoretical prediction of protein antigenic determinants from amino acid sequences. Biomed Biochim Acta 1987; 46(11): 855-66.
[PMID: 2451516]
Laxton RR. The measure of diversity. J Theor Biol 1978; 70(1): 51-67.
[http://dx.doi.org/10.1016/0022-5193(78)90302-8] [PMID: 625122]
Li QZ, Lu ZQ. The prediction of the structural class of protein: application of the measure of diversity. J Theor Biol 2001; 213(3): 493-502.
[http://dx.doi.org/10.1006/jtbi.2001.2441] [PMID: 11735294]
Shi R, Hu X. Predicting enzyme subclasses by using support vector machine with composite vectors. Protein Pept Lett 2010; 17(5): 599-604.
[http://dx.doi.org/10.2174/092986610791112710] [PMID: 19645687]
Shao J, Xu D, Tsai SN, Wang Y, Ngai SM. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS One 2009; 4(3)e4920
[http://dx.doi.org/10.1371/journal.pone.0004920] [PMID: 19290060]
Song J, Tan H, Shen H, et al. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics 2010; 26(6): 752-60.
[http://dx.doi.org/10.1093/bioinformatics/btq043] [PMID: 20130033]
Liu W, Chou KC. Prediction of protein secondary structure content. Protein Eng 1999; 12(12): 1041-50.
[http://dx.doi.org/10.1093/protein/12.12.1041] [PMID: 10611397]
Liu D, Li G, Zuo Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief Bioinform 2018; 2018: 10.
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743]
Sibley AB, Cosman M, Krishnan VV. An empirical correlation between secondary structure content and averaged chemical shifts in proteins. Biophys J 2003; 84(2 Pt 1): 1223-7.
[http://dx.doi.org/10.1016/S0006-3495(03)74937-6] [PMID: 12547802]
Mielke SP, Krishnan VV. Protein structural class identification directly from NMR spectra using averaged chemical shifts. Bioinformatics 2003; 19(16): 2054-64.
[http://dx.doi.org/10.1093/bioinformatics/btg280] [PMID: 14594710]
Zhu XJ, Feng CQ, Lai HY, Chen W, Lin H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Base Syst 2019; 163: 787-93.
Ding H, Deng EZ, Yuan LF, et al. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res Int 2014; 2014286419
[http://dx.doi.org/10.1155/2014/286419] [PMID: 24991545]
Tan JX, Li SH, Zhang ZM, et al. Identification of hormone binding proteins based on machine learning methods. Math Biosci Eng 2019; 16(4): 2466-80.
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222]
Liu B, Chen J, Wang X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics 2015; 290(5): 1919-31.
[http://dx.doi.org/10.1007/s00438-015-1044-4] [PMID: 25896721]
Zou Q, Wan S, Ju Y, Tang J, Zeng X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst Biol 2016; 10(Suppl. 4): 114.
[http://dx.doi.org/10.1186/s12918-016-0353-5] [PMID: 28155714]
Feng P, Lin H, Chen W, Zuo Y. Predicting the types of J-proteins using clustered amino acids. BioMed Res Int 2014; 2014935719
[http://dx.doi.org/10.1155/2014/935719] [PMID: 24804260]
Yu L, Sun X, Tian SW, Shi XY, Yan YL. Drug and nondrug classification based on deep learning with various feature selection strategies. Curr Bioinform 2018; 13: 253-9.
Wang L, Wang Y, Chang Q. Feature selection methods for big data bioinformatics: A survey from the search perspective. Methods 2016; 111: 21-31.
[http://dx.doi.org/10.1016/j.ymeth.2016.08.014] [PMID: 27592382]
Zou Q, Zeng J, Cao L, Ji R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 2016; 173: 346-54.
Hess AS, Hess JR. Analysis of variance. Transfusion 2018; 58(10): 2255-6.
[http://dx.doi.org/10.1111/trf.14790] [PMID: 30203486]
Mitra V, Govorukhina N, Zwanenburg G, et al. Identification of analytical factors affecting complex proteomics profiles acquired in a factorial design study with analysis of variance: simultaneous component analysis. Anal Chem 2016; 88(8): 4229-38.
[http://dx.doi.org/10.1021/acs.analchem.5b03483] [PMID: 26959230]
Yang H, Lv H, Ding H, Chen W, Lin H. iRNA-2OM: A sequence-based predictor for identifying 2′-o-methylation sites in homo sapiens. J Comput Biolational 2018; 25: 1266-77.
Feng CQ, Zhang ZY, Zhu XJ, et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 2019; 35(9): 1469-77.
[PMID: 30247625]
Dao FY, Lv H, Wang F, et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2018; 35(12)
[PMID: 30428009]
Rocchi L, Chiari L, Cappello A. Feature selection of stabilometric parameters based on principal component analysis. Med Biol Eng Comput 2004; 42(1): 71-9.
[http://dx.doi.org/10.1007/BF02351013] [PMID: 14977225]
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics 2018; 34(12): 2029-36.
[http://dx.doi.org/10.1093/bioinformatics/bty039] [PMID: 29420699]
Lin H, Ding H, Guo FB, Huang J. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol Divers 2010; 14(4): 667-71.
[http://dx.doi.org/10.1007/s11030-009-9205-1] [PMID: 19908156]
Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. RNA 2019; 25(2): 205-18.
[http://dx.doi.org/10.1261/rna.069112.118] [PMID: 30425123]
Supriya M, Deepa AJ. A novel approach for breast cancer prediction using optimized ANN classifier based on big data environment. Health Care Manage Sci 2019.
[http://dx.doi.org/10.1007/s10729-019-09498-w] [PMID: 31686276]
Jiang LM, Liao ZJ, Su R, Wei LY. Improved identification of cytokines using feature selection techniques. Lett Org Chem 2017; 14: 632-41.
Lei GC, Tang JJ, Du PF. Predicting S-sulfenylation sites using physicochemical properties differences. Lett Org Chem 2017; 14: 665-72.
Lin H, Liang ZY, Tang H, Chen W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinformatics 2019; 16: 1316-21.
[PMID: 28186907]
Zhang J, Feng P, Lin H, Chen W. Identifying RNA N6-methyladenosine sites in Escherichia coli genome. Front Microbiol 2018; 9: 955.
[http://dx.doi.org/10.3389/fmicb.2018.00955] [PMID: 29867860]
Tang H, Zhao YW, Zou P, et al. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 2018; 14(8): 957-64.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
Yang H, Qiu WR, Liu G, et al. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci 2018; 14(8): 883-91.
[http://dx.doi.org/10.7150/ijbs.24616] [PMID: 29989083]
Manavalan B, Shin TH, Lee G. PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Front Microbiol 2018; 9: 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
Manavalan B, Lee J. SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 2017; 33(16): 2496-503.
[http://dx.doi.org/10.1093/bioinformatics/btx222] [PMID: 28419290]
Ye J, Chen W, Jin DC. Predicting the types of plant heat shock proteins. Lett Org Chem 2017; 14: 684-9.
Yang H, Yang W, Dao FY, et al. A comparison and assessment of computational method for identifying recombination hotspots in Saccharomyces cerevisiae. Brief Bioinform 2019.bbz123
[http://dx.doi.org/10.1093/bib/bbz123] [PMID: 31633777]
Wu J, Zhang Q, Wu W, et al. WDL-RF: predicting bioactivities of ligand molecules acting with G protein-coupled receptors by combining weighted deep learning and random forest. Bioinformatics 2018; 34(13): 2271-82.
[http://dx.doi.org/10.1093/bioinformatics/bty070] [PMID: 29432522]
Xu L, Liang G, Liao C, Chen GD, Chang CC. k-Skip-n-Gram-RF: a random forest based method for alzheimer’s disease protein identification. Front Genet 2019; 10: 33.
[http://dx.doi.org/10.3389/fgene.2019.00033] [PMID: 30809242]
Ru X, Li L, Zou Q. Incorporating distance-based top-n-gram and random forest to identify electron transport proteins. J Proteome Res 2019; 18(7): 2931-9.
[http://dx.doi.org/10.1021/acs.jproteome.9b00250] [PMID: 31136183]
Su R, Liu X, Wei L, Zou Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods 2019; 166: 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464]
Lv H, Zhang ZM, Li SH, Tan JX, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinform 2019; •••bbz048
[PMID: 31157855]
Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics 2018; 34(11): 1953-6.
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045]
Cheng L, Jiang Y, Ju H, et al. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics 2018; 19(Suppl. 1): 919.
[http://dx.doi.org/10.1186/s12864-017-4338-6] [PMID: 29363423]
Ferrando L, Cirmena G, Garuti A, et al. Development of a long non-coding RNA signature for prediction of response to neoadjuvant chemoradiotherapy in locally advanced rectal adenocarcinoma. PLoS One 2020; 15(2)e0226595
[http://dx.doi.org/10.1371/journal.pone.0226595] [PMID: 32023246]
Yang W, Zhu XJ, Huang J, Ding H, Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Curr Bioinform 2019; 14: 234-40.
Lai HY, Zhang ZY, Su ZD, et al. iProEP: a computational predictor for predicting promoter. Mol Ther Nucleic Acids 2019; 17: 337-46.
[http://dx.doi.org/10.1016/j.omtn.2019.05.028] [PMID: 31299595]
Chen W, Yang H, Feng P, Ding H, Lin H. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics 2017; 33(22): 3518-23.
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687]
Zhang W, Liu J, Xiong Y, Ke M, Zhang K. Predicting immunogenic T-cell epitopes by combining various sequence-derived features 2013.
Zhu PP, Li WC, Zhong ZJ, et al. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol Biosyst 2015; 11(2): 558-63.
[http://dx.doi.org/10.1039/C4MB00645C] [PMID: 25437899]
Manavalan B, Shin TH, Lee G. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget 2017; 9(2): 1944-56.
[PMID: 29416743]
Manavalan B, Basith S, Shin TH, Choi S, Kim MO, Lee G. MLACP: machine-learning-based prediction of anticancer peptides. Oncotarget 2017; 8(44): 77121-36.
[http://dx.doi.org/10.18632/oncotarget.20365] [PMID: 29100375]
Lin YQ, Min XP, Li LL, et al. Using a machine-learning approach to predict discontinuous antibody-specific b-cell epitopes. Curr Bioinform 2017; 12: 406-15.
Zuo YC, Li QZ. Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids. Amino Acids 2010; 38(3): 859-67.
[http://dx.doi.org/10.1007/s00726-009-0292-1] [PMID: 19387791]
Ding H, Yang W, Tang H, et al. PHYPred: a tool for identifying bacteriophage enzymes and hydrolases. Virol Sin 2016; 31(4): 350-2.
[http://dx.doi.org/10.1007/s12250-016-3740-6] [PMID: 27151186]
Chen W, Lv H, Nie F, Lin H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 2019; 35(16): 2796-800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med 2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med 2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H. A survey for predicting enzyme family classes using machine learning methods. Curr Drug Targets 2019; 20(5): 540-50.
[http://dx.doi.org/10.2174/1389450119666181002143355] [PMID: 30277150]
Lane N, Martin W. The energetics of genome complexity. Nature 2010; 467(7318): 929-34.
[http://dx.doi.org/10.1038/nature09486] [PMID: 20962839]
Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P. SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 2006; 34(Database issue): D257-60.
[http://dx.doi.org/10.1093/nar/gkj079] [PMID: 16381859]
Tatusov RL, Fedorova ND, Jackson JD, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003; 4: 41.
[http://dx.doi.org/10.1186/1471-2105-4-41] [PMID: 12969510]
Marchler-Bauer A, Anderson JB, Derbyshire MK, et al. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 2007; 35(Database issue): D237-40.
[http://dx.doi.org/10.1093/nar/gkl951] [PMID: 17135202]
Qiu JD, Huang JH, Shi SP, Liang RP. Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett 2010; 17(6): 715-22.
[http://dx.doi.org/10.2174/092986610791190372] [PMID: 19961429]
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E. Proteomics, networks and connectivity indices. Proteomics 2008; 8(4): 750-78.
[http://dx.doi.org/10.1002/pmic.200700638] [PMID: 18297652]
Concu R, Podda G, Uriarte E, González-Díaz H. Computational chemistry study of 3D-structure-function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials. J Comput Chem 2009; 30(9): 1510-20.
[http://dx.doi.org/10.1002/jcc.21170] [PMID: 19086060]
González-Díaz H, Prado-Prado F, Ubeira FM. Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. Curr Top Med Chem 2008; 8(18): 1676-90.
[http://dx.doi.org/10.2174/156802608786786543] [PMID: 19075774]
Tang SN, Sun JM, Xiong WW, Cong PS, Li TH. Identification of the subcellular localization of mycobacterial proteins using localization motifs. Biochimie 2012; 94(3): 847-53.
[http://dx.doi.org/10.1016/j.biochi.2011.12.003] [PMID: 22182488]
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005; 27(8): 1226-38.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
Mezghani N, Husse S, Boivin K, et al. Automatic classification of asymptomatic and osteoarthritis knee gait patterns using kinematic data features and the nearest neighbor classifier Ieee T Bio-Med Eng 2008; 55: 1230-2 2008.
Li BQ, Zhang YH, Jin ML, Huang T, Cai YD. Prediction of protein-peptide interactions with a nearest neighbor algorithm. Curr Bioinform 2018; 13: 14-24.
Yuan LZ, Yong EF, Wei Z, Shan KG. Using quadratic discriminant analysis to predict protein secondary structure based on chemical shifts. Curr Bioinform 2017; 12: 52-6.
Wei LY, Su R, Wang B, Li XT, Zou Q, Gao X. Integration of deep feature representations and handcrafted features to improve the prediction of N-6-methyladenosine sites. Neurocomputing 2019; 324: 3-9.
Kerkech M, Hafiane A, Canals R. Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images. Comput Electron Agric 2018; 155: 237-43.
Li Y, Niu M, Zou Q. ELM-MHC: An improved MHC identification method with extreme learning machine algorithm. J Proteome Res 2019; 18(3): 1392-401.
[http://dx.doi.org/10.1021/acs.jproteome.9b00012] [PMID: 30698979]
Behjati Ardakani F, Schmidt F, Schulz MH. Predicting transcription factor binding using ensemble random forest models. F1000 Res 2018; 7: 1603.
[http://dx.doi.org/10.12688/f1000research.16200.1] [PMID: 31723409]
Zou Q, Guo J, Ju Y, Wu M, Zeng X, Hong Z. Improving tRNAscan-SE annotation results via ensemble classifiers. Mol Inform 2015; 34(11-12): 761-70.
[http://dx.doi.org/10.1002/minf.201500031] [PMID: 27491037]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Published on: 11 August, 2020
Page: [3049 - 3058]
Pages: 10
DOI: 10.2174/1381612826666200310122324
Price: $65

Article Metrics

PDF: 30