A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods

Author(s): Zheng-Xing Guan, Shi-Hao Li, Zi-Mei Zhang, Dan Zhang, Hui Yang, Hui Ding*

Journal Name: Current Genomics

Volume 21 , Issue 1 , 2020

Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as timeconsuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.

Keywords: microRNA, precursor, identification, machine learning methods, benchmark dataset, feature extraction, prediction algorithm.

Ambros, V. The functions of animal microRNAs. Nature, 2004, 431(7006), 350-355.
[http://dx.doi.org/10.1038/nature02871] [PMID: 15372042]
Ruvkun, G.; Giusto, J. The Caenorhabditis elegans heterochronic gene lin-14 encodes a nuclear protein that forms a temporal developmental switch. Nature, 1989, 338(6213), 313-319.
[http://dx.doi.org/10.1038/338313a0] [PMID: 2922060]
Reinhart, B.J.; Slack, F.J.; Basson, M.; Pasquinelli, A.E.; Bettinger, J.C.; Rougvie, A.E.; Horvitz, H.R.; Ruvkun, G. The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature, 2000, 403(6772), 901-906.
[http://dx.doi.org/10.1038/35002607] [PMID: 10706289]
Lee, Y.; Kim, M.; Han, J.; Yeom, K.H.; Lee, S.; Baek, S.H.; Kim, V.N. MicroRNA genes are transcribed by RNA polymerase II. EMBO J., 2004, 23(20), 4051-4060.
[http://dx.doi.org/10.1038/sj.emboj.7600385] [PMID: 15372072]
Lee, Y.; Ahn, C.; Han, J.; Choi, H.; Kim, J.; Yim, J.; Lee, J.; Provost, P.; Rådmark, O.; Kim, S.; Kim, V.N. The nuclear RNase III Drosha initiates microRNA processing. Nature, 2003, 425(6956), 415-419.
[http://dx.doi.org/10.1038/nature01957] [PMID: 14508493]
Kim, V.N. MicroRNA precursors in motion: exportin-5 mediates their nuclear export. Trends Cell Biol., 2004, 14(4), 156-159.
[http://dx.doi.org/10.1016/j.tcb.2004.02.006] [PMID: 15134074]
Bohnsack, M.T.; Czaplinski, K.; Gorlich, D. Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. RNA, 2004, 10(2), 185-191.
[http://dx.doi.org/10.1261/rna.5167604] [PMID: 14730017]
Knight, S.W.; Bass, B.L. A role for the RNase III enzyme DCR-1 in RNA interference and germ line development in Caenorhabditis elegans. Science, 2001, 293(5538), 2269-2271.
[http://dx.doi.org/10.1126/science.1062039] [PMID: 11486053]
Gregory, R.I.; Chendrimada, T.P.; Cooch, N.; Shiekhattar, R. Human RISC couples microRNA biogenesis and posttranscriptional gene silencing. Cell, 2005, 123(4), 631-640.
[http://dx.doi.org/10.1016/j.cell.2005.10.022] [PMID: 16271387]
Millar, A.A.; Waterhouse, P.M. Plant and animal microRNAs: similarities and differences. Funct. Integr. Genomics, 2005, 5(3), 129-135.
[http://dx.doi.org/10.1007/s10142-005-0145-2] [PMID: 15875226 ]
Kittelmann, S.; McGregor, A.P. Modulation and evolution of animal development through microRNA regulation of gene expression. Genes (Basel), 2019, 10(4), 10.
[http://dx.doi.org/10.3390/genes10040321] [PMID: 31027314]
López-Ruiz, B.A.; Juárez-González, V.T.; Sandoval-Zapotitla, E.; Dinkova, T.D. Development-related miRNA expression and target regulation during staggered in vitro plant regeneration of Tuxpeño VS-535 maize cultivar. Int. J. Mol. Sci., 2019, 20(9), 20.
[http://dx.doi.org/10.3390/ijms20092079] [PMID: 31035580]
Sun, Y.; Gao, Y.; Song, T.; Yu, C.; Nie, Z.; Wang, X. MicroRNA-15b participates in the development of peripheral arterial disease by modulating the growth of vascular smooth muscle cells. Exp. Ther. Med., 2019, 18(1), 77-84.
[http://dx.doi.org/10.3892/etm.2019.7552] [PMID: 31258640]
Xia, M.M.; Shen, X.Y.; Niu, C.M.; Xia, J.; Sun, H.Y.; Zheng, Y. [MicroRNA regulates Sertoli cell proliferation and adhesion]. Yi Chuan, 2018, 40(9), 724-732.
[PMID: 30369476]
Zhang, J.; Xu, Y.; Liu, H.; Pan, Z. MicroRNAs in ovarian follicular atresia and granulosa cell apoptosis. Reprod. Biol. Endocrinol., 2019, 17(1), 9.
[http://dx.doi.org/10.1186/s12958-018-0450-y] [PMID: 30630485]
Chen, P.; Zhang, H.; Sun, X.; Hu, Y.; Jiang, W.; Liu, Z.; Liu, S.; Zhang, X. microRNA-449a modulates medullary thymic epithelial cell differentiation. Sci. Rep., 2017, 7(1), 15915.
[http://dx.doi.org/10.1038/s41598-017-16162-2] [PMID: 29162901]
Chen, Z.; Chu, S.; Wang, X.; Fan, Y.; Zhan, T.; Arbab, A.A.I.; Li, M.; Zhang, H.; Mao, Y.; Loor, J.J.; Yang, Z. MicroRNA-106b regulates milk fat metabolism via ATP binding cassette subfamily A member 1 (ABCA1) in bovine mammary epithelial cells. J. Agric. Food Chem., 2019, 67(14), 3981-3990.
[http://dx.doi.org/10.1021/acs.jafc.9b00622] [PMID: 30892026]
Liao, Z.; Li, D.; Wang, X. Cancer diagnosis from isomiR expression with machine learning method. Curr. Bioinform., 2018, 13, 57-63.
Tang, W.; Wan, S.; Yang, Z.; Teschendorff, A.E.; Zou, Q. Tumor origin detection with tissue-specific miRNA and DNA methylation markers. Bioinformatics, 2018, 34(3), 398-406.
[http://dx.doi.org/10.1093/bioinformatics/btx622] [PMID: 29028927]
Rupaimoole, R.; Slack, F.J. MicroRNA therapeutics: towards a new era for the management of cancer and other diseases. Nat. Rev. Drug Discov., 2017, 16(3), 203-222.
[http://dx.doi.org/10.1038/nrd.2016.246] [PMID: 28209991]
Xue, J.; Yang, J.; Luo, M.; Cho, W.C.; Liu, X. MicroRNA-targeted therapeutics for lung cancer treatment. Expert Opin. Drug Discov., 2017, 12(2), 141-157.
[http://dx.doi.org/10.1080/17460441.2017.1263298] [PMID: 27866431]
Zambrano, T.; Salazar, L.A. microRNAs and response to statins in patients with hypercholesterolemia: from basic research to precision medicine. Pharmacogenomics, 2018, 19(9), 748-751.
[http://dx.doi.org/10.2217/pgs-2018-0051] [PMID: 29785870]
Cheng, L.; Hu, Y.; Sun, J.; Zhou, M.; Jiang, Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics, 2018, 34(11), 1953-1956.
[http://dx.doi.org/10.1093/bioinformatics/bty002] [PMID: 29365045]
Cheng, L.; Sun, J.; Xu, W.; Dong, L.; Hu, Y.; Zhou, M. OAHG: an integrated resource for annotating human genes with multi-level ontologies. Sci. Rep., 2016, 6, 34820.
[http://dx.doi.org/10.1038/srep34820] [PMID: 27703231]
Zhang, X.; Zou, Q.; Rodriguez-Paton, A.; Zeng, X. Meta-path methods for prioritizing candidate disease miRNAs. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(1), 283-291.
[http://dx.doi.org/10.1109/TCBB.2017.2776280] [PMID: 29990255]
Lagos-Quintana, M.; Rauhut, R.; Lendeckel, W.; Tuschl, T. Identification of novel genes coding for small expressed RNAs. Science, 2001, 294(5543), 853-858.
[http://dx.doi.org/10.1126/science.1064921] [PMID: 11679670]
Lau, N.C.; Lim, L.P.; Weinstein, E.G.; Bartel, D.P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science, 2001, 294(5543), 858-862.
[http://dx.doi.org/10.1126/science.1065062] [PMID: 11679671]
Ruby, J.G.; Stark, A.; Johnston, W.K.; Kellis, M.; Bartel, D.P.; Lai, E.C. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res., 2007, 17(12), 1850-1864.
[http://dx.doi.org/10.1101/gr.6597907] [PMID: 17989254]
Lai, E.C.; Tomancak, P.; Williams, R.W.; Rubin, G.M. Computational identification of Drosophila microRNA genes. Genome Biol., 2003, 4(7), R42.
[http://dx.doi.org/10.1186/gb-2003-4-7-r42] [PMID: 12844358]
Wang, X.; Zhang, J.; Li, F.; Gu, J.; He, T.; Zhang, X.; Li, Y. MicroRNA identification based on sequence and structure alignment. Bioinformatics, 2005, 21(18), 3610-3614.
[http://dx.doi.org/10.1093/bioinformatics/bti562] [PMID: 15994192]
Batuwita, R.; Palade, V. microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics, 2009, 25(8), 989-995.
[http://dx.doi.org/10.1093/bioinformatics/btp107] [PMID: 19233894]
Ng, K.L.; Mishra, S.K. De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics, 2007, 23(11), 1321-1330.
[http://dx.doi.org/10.1093/bioinformatics/btm026] [PMID: 17267435]
Xue, C.; Li, F.; He, T.; Liu, G.P.; Li, Y.; Zhang, X. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics, 2005, 6, 310.
[http://dx.doi.org/10.1186/1471-2105-6-310] [PMID: 16381612]
Jiang, P.; Wu, H.; Wang, W.; Ma, W.; Sun, X.; Lu, Z. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res., 2007, 35(Web Server issue), W339-W344.
[http://dx.doi.org/10.1093/nar/gkm368] [PMID: 17553836]
Agarwal, S.; Vaz, C.; Bhattacharya, A.; Srinivasan, A. Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM). BMC Bioinformatics, 2010, 11(Suppl. 1), S29.
[http://dx.doi.org/10.1186/1471-2105-11-S1-S29] [PMID: 20122201]
Xuan, P.; Guo, M.; Liu, X.; Huang, Y.; Li, W.; Huang, Y. PlantMiRNAPred: efficient classification of real and pseudo plant pre-miRNAs. Bioinformatics, 2011, 27(10), 1368-1376.
[http://dx.doi.org/10.1093/bioinformatics/btr153] [PMID: 21441575]
Wei, L.; Liao, M.; Gao, Y. Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set. BMC Bioinformatics, 2014, 11(Suppl. 1), S29.
Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chen, J.; Chou, K.C. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One, 2015, 10(3)e0121501
[http://dx.doi.org/10.1371/journal.pone.0121501] [PMID: 25821974]
Yao, Y.; Ma, C.; Deng, H.; Liu, Q.; Zhang, J.; Yi, M. plantMirP: an efficient computational program for the prediction of plant pre-miRNA by incorporating knowledge-based energy features. Mol. Biosyst., 2016, 12(10), 3124-3131.
[http://dx.doi.org/10.1039/C6MB00295A] [PMID: 27472470]
Liu, B.; Fang, L.; Liu, F.; Wang, X.; Chou, K.C. iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J. Biomol. Struct. Dyn., 2016, 34(1), 223-235.
[http://dx.doi.org/10.1080/07391102.2015.1014422] [PMID: 25645238]
Jiang, L.; Zhang, J.; Xuan, P.; Zou, Q. BP neural network could help improve pre-miRNA identification in various species. BioMed Res. Int., 2016, 20169565689
[http://dx.doi.org/10.1155/2016/9565689] [PMID: 27635401]
Zheng, X.; Xu, S.; Zhang, Y.; Huang, X. Nucleotide-level convolutional neural networks for pre-miRNA classification. Sci. Rep., 2019, 9(1), 628.
[http://dx.doi.org/10.1038/s41598-018-36946-4] [PMID: 30679648]
Fu, X.; Zhu, W.; Cai, L.; Liao, B.; Peng, L.; Chen, Y.; Yang, J. Improved pre-miRNAs identification through mutual information of pre-miRNA sequences and structures. Front. Genet., 2019, 10, 119.
[http://dx.doi.org/10.3389/fgene.2019.00119] [PMID: 30858864]
Gudyś, A.; Szcześniak, M.W.; Sikora, M.; Makałowska, I. HuntMi: an efficient and taxon-specific approach in pre-miRNA identification. BMC Bioinformatics, 2013, 14, 83.
[http://dx.doi.org/10.1186/1471-2105-14-83] [PMID: 23497112]
Stegmayer, G.; Yones, C.; Kamenetzky, L. High class-imbalance in pre-miRNA prediction: a novel approach based on deepSOM. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2017, 14, 1316-1326.
Yones, C.; Stegmayer, G.; Milone, D.H.; Sahinalp, C. Genome-wide pre-miRNA discovery from few labeled examples. Bioinformatics, 2018, 34(4), 541-549.
[http://dx.doi.org/10.1093/bioinformatics/btx612] [PMID: 29028911]
Tav, C.; Tempel, S.; Poligny, L.; Tahi, F. miRNAFold: a web server for fast miRNA precursor prediction in genomes. Nucleic Acids Res., 2016, 44(W1), W181-W184.
[http://dx.doi.org/10.1093/nar/gkw459] [PMID: 27242364]
Pfeffer, S.; Sewer, A.; Lagos-Quintana, M.; Sheridan, R.; Sander, C.; Grässer, F.A.; van Dyk, L.F.; Ho, C.K.; Shuman, S.; Chien, M.; Russo, J.J.; Ju, J.; Randall, G.; Lindenbach, B.D.; Rice, C.M.; Simon, V.; Ho, D.D.; Zavolan, M.; Tuschl, T. Identification of microRNAs of the herpesvirus family. Nat. Methods, 2005, 2(4), 269-276.
[http://dx.doi.org/10.1038/nmeth746] [PMID: 15782219 ]
Meng, J.; Liu, D.; Sun, C.; Luan, Y. Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine. BMC Bioinformatics, 2014, 15, 423.
[http://dx.doi.org/10.1186/s12859-014-0423-x] [PMID: 25547126]
Tran, V. T.; Tempel, S.; Zerath, B.; Zehraoui, F.; Tahi, F. miRBoost: boosting support vector machines for microRNA precursor classification. RNA, 2015, 21(5), 775-785.
[http://dx.doi.org/10.1261/rna.043612.113] [PMID: 25795417]
Khan, A.; Shah, S.; Wahid, F.; Khan, F.G.; Jabeen, S. Identification of microRNA precursors using reduced and hybrid features. Mol. Biosyst., 2017, 13(8), 1640-1645.
[http://dx.doi.org/10.1039/C7MB00115K] [PMID: 28686281]
Yang, W.; Zhu, X.J.; Huang, J. A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform., 2019, 14, 234-240.
Lv, H.; Zhang, Z.M.; Li, S.H.; Tan, J.X.; Chen, W.; Lin, H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief. Bioinform., 2019.bbz048
[http://dx.doi.org/10.1093/bib/bbz048] [PMID: 31157855]
Stephenson, N.; Shane, E.; Chase, J. Survey of machine learning techniques in drug discovery. Curr. Drug Metab., 2019, 20(3), 185-193.
[http://dx.doi.org/10.2174/1389200219666180820112457] [PMID: 30124147]
Lai, H.Y.; Feng, C.Q.; Zhang, Z.Y.; Tang, H.; Chen, W.; Lin, H. A brief survey of machine learning application in cancerlectin identification. Curr. Gene Ther., 2018, 18(5), 257-267.
[http://dx.doi.org/10.2174/1566523218666180913112751] [PMID: 30209997]
Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; Li, C.; Qian, K.; Zhang, C.; Huang, Y.; Li, K.; Lin, H.; Wang, D. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res., 2017, 45(D1), D135-D138.
[PMID: 27543076]
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics, 2017, 33(3), 467-469.
[PMID: 28171531]
Cheng, L.; Wang, P.; Tian, R.; Wang, S.; Guo, Q.; Luo, M.; Zhou, W.; Liu, G.; Jiang, H.; Jiang, Q. LncRNA2Target v2.0: a comprehensive database for target genes of lncRNAs in human and mouse. Nucleic Acids Res., 2019, 47(D1), D140-D144.
[http://dx.doi.org/10.1093/nar/gky1051] [PMID: 30380072]
Cheng, L.; Yang, H.; Zhao, H.; Pei, X.; Shi, H.; Sun, J.; Zhang, Y.; Wang, Z.; Zhou, M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief. Bioinform., 2019, 20(1), 203-209.
[http://dx.doi.org/10.1093/bib/bbx103] [PMID: 28968812]
Hu, B.; Zheng, L.; Long, C.; Song, M.; Li, T.; Yang, L.; Zuo, Y. EmExplorer: a database for exploring time activation of gene expression in mammalian embryos. Open Biol., 2019, 9(6)190054
[http://dx.doi.org/10.1098/rsob.190054] [PMID: 31164042]
Griffiths-Jones, S. The microRNA Registry. Nucleic Acids Res., 2004, 32(Database issue), D109-D111.
[http://dx.doi.org/10.1093/nar/gkh023] [PMID: 14681370]
Kozomara, A.; Birgaoanu, M.; Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res., 2019, 47(D1), D155-D162.
[http://dx.doi.org/10.1093/nar/gky1141] [PMID: 30423142]
Zhang, Z.; Yu, J.; Li, D.; Zhang, Z.; Liu, F.; Zhou, X.; Wang, T.; Ling, Y.; Su, Z. PMRD: plant microRNA database. Nucleic Acids Res., 2010, 38(Database issue), D806-D813.
[http://dx.doi.org/10.1093/nar/gkp818] [PMID: 19808935]
Chou, C.H.; Shrestha, S.; Yang, C.D.; Chang, N.W.; Lin, Y.L.; Liao, K.W.; Huang, W.C.; Sun, T.H.; Tu, S.J.; Lee, W.H.; Chiew, M.Y.; Tai, C.S.; Wei, T.Y.; Tsai, T.R.; Huang, H.T.; Wang, C.Y.; Wu, H.Y.; Ho, S.Y.; Chen, P.R.; Chuang, C.H.; Hsieh, P.J.; Wu, Y.S.; Chen, W.L.; Li, M.J.; Wu, Y.C.; Huang, X.Y.; Ng, F.L.; Buddhakosai, W.; Huang, P.C.; Lan, K.C.; Huang, C.Y.; Weng, S.L.; Cheng, Y.N.; Liang, C.; Hsu, W.L.; Huang, H.D. miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic Acids Res., 2018, 46(D1), D296-D302.
[http://dx.doi.org/10.1093/nar/gkx1067] [PMID: 29126174]
Li, J.H.; Liu, S.; Zhou, H.; Qu, L.H.; Yang, J.H. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res., 2014, 42(Database issue), D92-D97.
[http://dx.doi.org/10.1093/nar/gkt1248] [PMID: 24297251]
Peace, R.J.; Biggar, K.K.; Storey, K.B.; Green, J.R. A framework for improving microRNA prediction in non-human genomes. Nucleic Acids Res., 2015, 43(20)e138
[http://dx.doi.org/10.1093/nar/gkv698] [PMID: 26163062]
Xu, Z.C.; Feng, P.M.; Yang, H.; Qiu, W.R.; Chen, W.; Lin, H. iRNAD: a computational tool for identifying D modification sites in RNA sequence. Bioinformatics, 2019, 35(23), 4922-4929.
[http://dx.doi.org/10.1093/bioinformatics/btz358] [PMID: 31077296]
Qu, K.Y.; Wei, L.Y.; Zou, Q. A review of DNA-binding proteins prediction methods. Curr. Bioinform., 2019, 14, 246-254.
Lin, H.; Liang, Z.Y.; Tang, H. identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16, 1316-1321.
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: a tool to identify growth hormone-binding proteins. Int. J. Biol. Sci., 2018, 14(8), 957-964.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
Song, J.; Wang, Y.; Li, F. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief. Bioinform., 2019, 20(2), 638-658.
[http://dx.doi.org/10.1093/bib/bby028] [PMID: 29897410]
Loh, S.K.; Low, S.T.; Chai, L.E. A review of computational approaches to predict gene functions. Curr. Bioinform., 2018, 13, 373-386.
Li, B.Q.; Zhang, Y.H.; Jin, M.L. Prediction of protein-peptide interactions with a nearest neighbor algorithm. Curr. Bioinform., 2018, 13, 14-24.
Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquez-Lago, T.T.; Wang, Y.; Webb, G.I.; Smith, A.I.; Daly, R.J.; Chou, K.C.; Song, J. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018, 34(14), 2499-2502.
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
Zhao, W.; Feng, Y.E. Identify protein 8-class secondary structure with quadratic discriminant algorithm based on the feature combination. Lett. Org. Chem., 2017, 14, 625-631.
Yuan, L.Z.; Yong, E.F.; Wei, Z. Using quadratic discriminant analysis to predict protein secondary structure based on chemical Shifts. Curr. Bioinform., 2017, 12, 52-56.
Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. ProLanGO: protein function prediction using neural machine translation based on a recurrent neural network. Molecules, 2017, 22(10), 22.
[http://dx.doi.org/10.3390/molecules22101732] [PMID: 29039790]
Ding, H.; Deng, E.Z.; Yuan, L.F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.C. iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res. Int., 2014, 2014286419
[http://dx.doi.org/10.1155/2014/286419] [PMID: 24991545]
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med., 2013, 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
Long, C.S.; Li, W.; Liang, P.F. Transcriptome comparisons of multi-species identify differential genome activation of mammals embryogenesis. IEEE Access, 2018, 7, 7794-7802.
Basith, S.; Manavalan, B.; Shin, T.H.; Lee, G. SDM6A: A web-based integrative machine-learning framework for predicting 6mA sites in the rice genome. Mol. Ther. Nucleic Acids, 2019, 18, 131-141.
[http://dx.doi.org/10.1016/j.omtn.2019.08.011] [PMID: 31542696]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol. Ther. Nucleic Acids, 2019, 16, 733-744.
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics, 2019, 35(16), 2757-2765.
[http://dx.doi.org/10.1093/bioinformatics/bty1047] [PMID: 30590410 ]
Hofacker, I.L. Vienna RNA secondary structure server. Nucleic Acids Res., 2003, 31(13), 3429-3431.
[http://dx.doi.org/10.1093/nar/gkg599] [PMID: 12824340 ]
Hofacker, I.L.; Priwitzer, B.; Stadler, P.F. Prediction of locally stable RNA secondary structures for genome-wide surveys. Bioinformatics, 2004, 20(2), 186-190.
[http://dx.doi.org/10.1093/bioinformatics/btg388] [PMID: 14734309]
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 2001, 43(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int., 2016, 20165413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int., 2016, 20161654623
[http://dx.doi.org/10.1155/2016/1654623] [PMID: 27437396]
Zuo, Y.; Li, Y.; Chen, Y.; Li, G.; Yan, Z.; Yang, L. PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics, 2017, 33(1), 122-124.
[http://dx.doi.org/10.1093/bioinformatics/btw564] [PMID: 27565583]
Zuo, Y.; Lv, Y.; Wei, Z.; Yang, L.; Li, G.; Fan, G. iDPF-PseRAAAC: A web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One, 2015, 10(12)e0145541
[http://dx.doi.org/10.1371/journal.pone.0145541] [PMID: 26713618]
Yang, H.; Lv, H.; Ding, H.; Chen, W.; Lin, H. iRNA-2OM: a sequence-based predictor for identifying 2′-O-methylation sites in Homo sapiens. J. Comput. Biol., 2018, 25(11), 1266-1277.
[http://dx.doi.org/10.1089/cmb.2018.0004] [PMID: 30113871]
Chen, W.; Feng, P.M.; Lin, H.; Chou, K.C. iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res. Int., 2014, 2014623149
[http://dx.doi.org/10.1155/2014/623149] [PMID: 24967386]
Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K.C. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics, 2015, 31(1), 119-120.
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID: 25231908]
Chou, K.C. A key driving force in determination of protein structural classes. Biochem. Biophys. Res. Commun., 1999, 264(1), 216-224.
[http://dx.doi.org/10.1006/bbrc.1999.1325] [PMID: 10527868]
Bonnet, E.; Wuyts, J.; Rouzé, P.; Van de Peer, Y. Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics, 2004, 20(17), 2911-2917.
[http://dx.doi.org/10.1093/bioinformatics/bth374] [PMID: 15217813]
Statistical Learning Theory; John Wiley and Sons Inc: New York, NY, USA, 1998.
Dao, F.Y.; Lv, H.; Wang, F.; Feng, C.Q.; Ding, H.; Chen, W.; Lin, H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics, 2019, 35(12), 2075-2083.
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
Feng, C.Q.; Zhang, Z.Y.; Zhu, X.J.; Lin, Y.; Chen, W.; Tang, H.; Lin, H. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics, 2019, 35(9), 1469-1477.
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
Lai, H.Y.; Zhang, Z.Y.; Su, Z.D.; Su, W.; Ding, H.; Chen, W.; Lin, H. iProEP: a computational predictor for predicting promoter. Mol. Ther. Nucleic Acids, 2019, 17, 337-346.
[http://dx.doi.org/10.1016/j.omtn.2019.05.028] [PMID: 31299595]
Zhu, X.J.; Feng, C.Q.; Lai, H.Y. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163, 787-793.
Manavalan, B.; Shin, T.H.; Lee, G. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget, 2017, 9(2), 1944-1956.
[PMID: 29416743]
Manavalan, B.; Shin, T.H.; Lee, G. PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Front. Microbiol., 2018, 9, 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
Tang, H.; Cao, R.Z.; Wang, W. A two-step discriminated method to identify thermophilic proteins. Int. J. Biomath., 2017, 10(4), 10.
Lin, C-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2011, 2, 27.
Breiman, L. Random forests. Mach. Learn., 2001, 45, 5-32.
Breiman, L. Bagging predictors. Mach. Learn., 1996, 24, 123-140.
Manavalan, B.; Lee, J.; Lee, J. Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms. PLoS One, 2014, 9(9)e106542
[http://dx.doi.org/10.1371/journal.pone.0106542] [PMID: 25222008]
Manavalan, B.; Shin, T.H.; Kim, M.O.; Lee, G. PIP-EL: A new ensemble learning method for improved proinflammatory peptide predictions. Front. Immunol., 2018, 9, 1783.
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]
Manavalan, B.; Shin, T.H.; Kim, M.O.; Lee, G. AIPpred: sequence-based prediction of anti-inflammatory peptides using random forest. Front. Pharmacol., 2018, 9, 276.
[http://dx.doi.org/10.3389/fphar.2018.00276] [PMID: 29636690]
Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. i4mCROSE, a bioinformatics tool for the identification of DNA N4- methylcytosine sites in the Rosaceae genome. nt. J. Biol. Macromol., 2019, S0141-8130(19)38547-2..
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID: 31805335]
Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Mol Omics, 2019, 15(6), 451-458.
[http://dx.doi.org/10.1039/C9MO00098D] [PMID: 31710075]
Dao, F.Y.; Lv, H.; Wang, F.; Ding, H. Recent advances on the machine learning methods in identifying DNA replication origins in eukaryotic genomics. Front. Genet., 2018, 9, 613.
[http://dx.doi.org/10.3389/fgene.2018.00613] [PMID: 30619452]
Kohonen, T. Self-organized formation of topologically correct feature maps., 1988.
Milone, D.H.; Stegmayer, G.S.; Kamenetzky, L.; López, M.; Lee, J.M.; Giovannoni, J.J.; Carrari, F. *omeSOM: a software for clustering and visualization of transcriptional and metabolite data mined from interspecific crosses of crop plants. BMC Bioinformatics, 2010, 11, 438.
[http://dx.doi.org/10.1186/1471-2105-11-438] [PMID: 20796314]
Yoon, B.J.; Vaidyanathan, P.P. Context-sensitive hidden Markov models for modeling long-range dependencies in symbol sequences. IEEE Trans. Signal Process., 2006, 54(11), 4166-4184.
Xue, L.; Tang, B.; Chen, W.; Luo, J. Prediction of CRISPR sgRNA activity using a deep convolutional neural network. J. Chem. Inf. Model., 2019, 59(1), 615-624.
[http://dx.doi.org/10.1021/acs.jcim.8b00368] [PMID: 30485088]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
Tan, J.X.; Li, S.H.; Zhang, Z.M.; Chen, C.X.; Chen, W.; Tang, H.; Lin, H. Identification of hormone binding proteins based on machine learning methods. Math. Biosci. Eng., 2019, 16(4), 2466-2480.
[http://dx.doi.org/10.3934/mbe.2019123] [PMID: 31137222]
Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K.C. iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1), 96-102.
[http://dx.doi.org/10.1016/j.ygeno.2018.01.005] [PMID: 29360500]
Chen, W.; Lv, H.; Nie, F.; Lin, H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics, 2019, 35(16), 2796-2800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
Manavalan, B.; Basith, S.; Shin, T.H.; Lee, D.Y.; Wei, L.; Lee, G. 4mCpred-EL: an ensemble learning framework for identification of DNA N4-methylcytosine sites in the mouse genome. Cells, 2019, 8(11), 8.
[http://dx.doi.org/10.3390/cells8111332] [PMID: 31661923 ]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. AtbPpred: a robust sequence-based prediction of anti-tubercular peptides using extremely randomized trees. Comput. Struct. Biotechnol. J., 2019, 17, 972-981.
[http://dx.doi.org/10.1016/j.csbj.2019.06.024] [PMID: 31372196]
Metz, C.E. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest. Radiol., 1989, 24(3), 234-245.
[http://dx.doi.org/10.1097/00004424-198903000-00012] [PMID: 2753640]
Cheng, L.; Jiang, Y.; Ju, H.; Sun, J.; Peng, J.; Zhou, M.; Hu, Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19(Suppl. 1), 919.
[http://dx.doi.org/10.1186/s12864-017-4338-6] [PMID: 29363423]
Cheng, L.; Zhuang, H.; Yang, S.; Jiang, H.; Wang, S.; Zhang, J. Exposing the causal effect of C-reactive protein on the risk of Type 2 diabetes mellitus: a mendelian randomization study. Front. Genet., 2018, 9, 657.
[http://dx.doi.org/10.3389/fgene.2018.00657] [PMID: 30619477 ]
Kavzoglu, T.; Mather, P.M. The role of feature selection in artificial neural network applications. Int. J. Remote Sens., 2002, 23, 2919-2937.
Chawla, N.V.; Bowyer, K.W.; Hall, L.O. smote: synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, 16, 321-357.
Boulesteix, A.L.; Strimmer, K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform., 2007, 8(1), 32-44.
[http://dx.doi.org/10.1093/bib/bbl016] [PMID: 16772269]
Tempel, S.; Tahi, F. A fast ab-initio method for predicting miRNA precursors in genomes. Nucleic Acids Res., 2012, 40(11)e80
[http://dx.doi.org/10.1093/nar/gks146] [PMID: 22362754]
Liu, D.; Li, G.; Zuo, Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief. Bioinform., 2019, 20(5), 1826-1835.
[http://dx.doi.org/10.1093/bib/bby053] [PMID: 29947743]
Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K.C. repRNA: a web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics, 2016, 291(1), 473-481.
[http://dx.doi.org/10.1007/s00438-015-1078-7] [PMID: 26085220]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
Lin, H.; Deng, E.Z.; Ding, H.; Chen, W.; Chou, K.C. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res., 2014, 42(21), 12961-12972.
[http://dx.doi.org/10.1093/nar/gku1019] [PMID: 25361964]
Saçar, M.D.; Allmer, J. Machine learning methods for microRNA gene prediction. Methods Mol. Biol., 2014, 1107, 177-187.
[http://dx.doi.org/10.1007/978-1-62703-748-8_10] [PMID: 24272437]
Hou, J.; Wu, T.; Cao, R.; Cheng, J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins, 2019, 87(12), 1165-1178.
[http://dx.doi.org/10.1002/prot.25697] [PMID: 30985027]
Peng, L.; Peng, M.M.; Liao, B. The advances and challenges of deep learning application in biological big data processing. Curr. Bioinform., 2018, 13, 352-359.
Patel, S.; Tripathi, R.; Kumari, V. DeepInteract: deep neural network based protein-protein interaction prediction tool. Curr. Bioinform., 2017, 12, 551-557.
Long, H.X.; Wang, M.; Fu, H.Y. Deep convolutional neural networks for predicting hydroxyproline in proteins. Curr. Bioinform., 2017, 12, 233-238.
Cao, R.; Bhattacharya, D.; Hou, J.; Cheng, J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics, 2016, 17(1), 495.
[http://dx.doi.org/10.1186/s12859-016-1405-y] [PMID: 27919220]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Published on: 14 February, 2020
Page: [11 - 25]
Pages: 15
DOI: 10.2174/1389202921666200214125102
Price: $65

Article Metrics

PDF: 19
PRC: 1