A Novel Integrative Approach for Non-coding RNA Classification Based on Deep Learning

Author(s): Abdelbasset Boukelia*, Anouar Boucheham, Meriem Belguidoum, Mohamed Batouche, Farida Zehraoui, Fariza Tahi

Journal Name: Current Bioinformatics

Volume 15 , Issue 4 , 2020


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Background: Molecular biomarkers show new ways to understand many disease processes. Noncoding RNAs as biomarkers play a crucial role in several cellular activities, which are highly correlated to many human diseases especially cancer. The classification and the identification of ncRNAs have become a critical issue due to their application, such as biomarkers in many human diseases.

Objective: Most existing computational tools for ncRNA classification are mainly used for classifying only one type of ncRNA. They are based on structural information or specific known features. Furthermore, these tools suffer from a lack of significant and validated features. Therefore, the performance of these methods is not always satisfactory.

Methods: We propose a novel approach named imCnC for ncRNA classification based on multisource deep learning, which integrates several data sources such as genomic and epigenomic data to identify several ncRNA types. Also, we propose an optimization technique to visualize the extracted features pattern from the multisource CNN model to measure the epigenomics features of each ncRNA type.

Results: The computational results using a dataset of 16 human ncRNA classes downloaded from RFAM show that imCnC outperforms the existing tools. Indeed, imCnC achieved an accuracy of 94,18%. In addition, our method enables to discover new ncRNA features using an optimization technique to measure and visualize the features pattern of the imCnC classifier.

Keywords: Multisource deep-learning, ncRNA classification, epigenetics, biomarkers, features pattern extraction, optimization.

[1]
de Almeida RA, Fraczek MG, Parker S, Delneri D, O’Keefe RT. Non-coding RNAs and disease: the classical ncRNAs make a comeback. Biochem Soc Trans 2016; 44(4): 1073-8.
[http://dx.doi.org/10.1042/BST20160089] [PMID: 27528754]
[2]
Esteller M. Non-coding RNAs in human disease. Nat Rev Genet 2011; 12(12): 861-74.
[http://dx.doi.org/10.1038/nrg3074] [PMID: 22094949]
[3]
Assumpção CB, Calcagno DQ, Araújo TMT, et al. The role of piRNA and its potential clinical implications in cancer. Epigenomics 2015; 7(6): 975-84.
[http://dx.doi.org/10.2217/epi.15.37] [PMID: 25929784]
[4]
Viereck J, Thum T. Circulating noncoding RNAs as biomarkers of cardiovascular disease and injury. Circ Res 2017; 120(2): 381-99.
[http://dx.doi.org/10.1161/CIRCRESAHA.116.308434] [PMID: 28104771]
[5]
Ning B, Li W, Zhao W, Wang R. Targeting epigenetic regulations in cancer. Acta Biochim Biophys Sin (Shanghai) 2016; 48(1): 97-109.
[PMID: 26508480]
[6]
Delpu Y, Larrieu D, Gayral M, et al. Noncoding RNAs: clinical and therapeutic applications. Drug Discovery Cancer Epigenetics 2016; pp. 305-26.
[http://dx.doi.org/10.1016/B978-0-12-802208-5.00012-6]
[7]
Rivas E, Eddy SR. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2001; 2(1): 8.
[http://dx.doi.org/10.1186/1471-2105-2-8] [PMID: 11801179]
[8]
Agarwal S, Vaz C, Bhattacharya A, Srinivasan A. Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM). BMC Bioinformatics 2010; 11(1): S29.
[http://dx.doi.org/10.1186/1471-2105-11-S1-S29] [PMID: 20122201]
[9]
Gruber AR, Findeiß S, Washietl S, Hofacker IL, Stadler PF. Rnaz 2.0: improved noncoding rna detection Biocomputing 2010. 2010; 69-79.
[10]
Pedersen JS, Bejerano G, Siepel A, et al. Identification and classification of conserved RNA secondary structures in the human genome. PLOS Comput Biol 2006; 2(4) e33
[http://dx.doi.org/10.1371/journal.pcbi.0020033] [PMID: 16628248]
[11]
Washietl S, Hofacker IL, Lukasser M, Hüttenhofer A, Stadler PF. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol 2005; 23(11): 1383-90.
[http://dx.doi.org/10.1038/nbt1144] [PMID: 16273071]
[12]
Kalvari I, Argasinska J, Quinones-Olvera N, et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 2018; 46(D1): D335-42.
[http://dx.doi.org/10.1093/nar/gkx1038] [PMID: 29112718]
[13]
Fang S, Zhang L, Guo J, et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res 2018; 46(D1): D308-14.
[http://dx.doi.org/10.1093/nar/gkx1107] [PMID: 29140524]
[14]
Zhang Y, Huang H, Zhang D, et al. A review on recent computational methods for predicting noncoding RNAs. BioMed Res Int 2017; 2017 9139504
[http://dx.doi.org/10.1155/2017/9139504] [PMID: 28553651]
[15]
Sun L, Luo H, Bu D, et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res 2013; 41(17): e166-6.
[http://dx.doi.org/10.1093/nar/gkt646] [PMID: 23892401]
[16]
Li A, Zhang J, Zhou Z. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 2014; 15(1): 311.
[http://dx.doi.org/10.1186/1471-2105-15-311] [PMID: 25239089]
[17]
Liu J, Gough J, Rost B. Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet 2006; 2(4) e29
[http://dx.doi.org/10.1371/journal.pgen.0020029] [PMID: 16683024]
[18]
Kong L, Zhang Y, Ye Z-Q, et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 2007; 35(Suppl. 2). W345-9
[http://dx.doi.org/10.1093/nar/gkm391]
[19]
Panwar B, Arora A, Raghava GP. Prediction and classification of ncRNAs using structural information. BMC Genomics 2014; 15(1): 127.
[http://dx.doi.org/10.1186/1471-2164-15-127] [PMID: 24521294]
[20]
Fiannaca A, La Rosa M, La Paglia L, Rizzo R, Urso A. nRC: non-coding RNA Classifier based on structural features. BioData Min 2017; 10(1): 27.
[http://dx.doi.org/10.1186/s13040-017-0148-2] [PMID: 28785313]
[21]
Borgelt C, Meinl T, Berthold M. Moss: a program for molecular substructure mining frequent pattern mining implementations, OSDM '05: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations 2005.
[http://dx.doi.org/10.1145/1133905.1133908]
[22]
Hackermüller J, Reiche K, Otto C, et al. Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs. Genome Biol 2014; 15(3): R48.
[http://dx.doi.org/10.1186/gb-2014-15-3-r48] [PMID: 24594072]
[23]
Philippe N, Bou Samra E, Boureux A, et al. Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome. Nucleic Acids Res 2014; 42(5): 2820-32.
[http://dx.doi.org/10.1093/nar/gkt1300] [PMID: 24357408]
[24]
Videm P, Rose D, Costa F, Backofen R. BlockClust: efficient clustering and classification of non-coding RNAs from short read RNA-seq profiles. Bioinformatics 2014; 30(12): i274-82.
[http://dx.doi.org/10.1093/bioinformatics/btu270] [PMID: 24931994]
[25]
Gellert P, Ponomareva Y, Braun T, Uchida S. Noncoder: a web interface for exon array-based detection of long non-coding RNAs. Nucleic Acids Res 2013; 41(1) e20
[http://dx.doi.org/10.1093/nar/gks877] [PMID: 23012263]
[26]
Lu Z, Matera AG. Vicinal: a method for the determination of ncRNA ends using chimeric reads from RNA-seq experiments. Nucleic Acids Res 2014; 42(9) e79
[http://dx.doi.org/10.1093/nar/gku207] [PMID: 24623808]
[27]
Adelman K, Egan E. Non-coding RNA: More uses for genomic junk. Nature 2017; 543(7644): 183-5.
[http://dx.doi.org/10.1038/543183a] [PMID: 28277509]
[28]
Shivakumar M, Lee Y, Bang L, Garg T, Sohn K-A, Kim D. Identification of epigenetic interactions between miRNA and DNA methylation associated with gene expression as potential prognostic markers in bladder cancer. BMC Med Genomics 2017; 10(1)(Suppl. 1): 30.
[http://dx.doi.org/10.1186/s12920-017-0269-y] [PMID: 28589857]
[29]
Bianchi M, Renzini A, Adamo S, Moresi V. Coordinated actions of microRNAs with other epigenetic factors regulate skeletal muscle development and adaptation. Int J Mol Sci 2017; 18(4): 840.
[http://dx.doi.org/10.3390/ijms18040840] [PMID: 28420141]
[30]
Lev I, Gingold H, Rechavi O. H3K9me3 is required for trans-generational inheritance of small RNAs that target a unique subset of newly evolved genes. bioRxiv 2018.
[http://dx.doi.org/10.1101/338582]
[31]
Boucheham A, Sommard V, Zehraoui F, et al. IpiRId: Integrative approach for piRNA prediction using genomic and epigenomic data. PLoS One 2017; 12(6) e0179787
[http://dx.doi.org/10.1371/journal.pone.0179787] [PMID: 28622364]
[32]
Ouyang W, Chu X, Wang X. Multi-source deep learning for human pose estimation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014; 2329-36.
[33]
Ge L, Gao J, Li X, Zhang A. Multi-source deep learning for information trustworthiness estimation Proceedings of the 19th ACM SIGKDD inteRNAtional conference on Knowledge discovery and data mining. 2013; 766-4.
[http://dx.doi.org/10.1145/2487575.2487612]
[34]
Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10,000 classes Proceedings of the IEEE conference on computer vision and pattern recognition. 2014; 1891-8.
[http://dx.doi.org/10.1109/CVPR.2014.244]
[35]
Cai Y, Tsai H-C, Yen R-W C, et al. Critical threshold levels of DNA methyltransferase 1 are required to maintain DNA methylation across the genome in human cancer cells. Genome Res 2017; 27(4): 533-44.
[http://dx.doi.org/10.1101/gr.208108.116]
[36]
Audia JE, Campbell RM. Histone modifications and cancer. Cold Spring Harb Perspect Biol 2016; 8(4) a019521
[http://dx.doi.org/10.1101/cshperspect.a019521] [PMID: 27037415]
[37]
Kingma DP, Ba J. Adam A method for stochastic optimization, arXiv 2017.
[38]
Yosinski J, Clune J, Nguyen A, Fuchs T, Lipson H. Understanding neural networks through deep visualization, arXiv 1506.
[39]
Singh R, Lanchantin J, Robins G, Qi Y. DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 2016; 32(17): i639-48.
[http://dx.doi.org/10.1093/bioinformatics/btw427] [PMID: 27587684]
[40]
Albrecht F, List M, Bock C, Lengauer T. DeepBlue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Res 2016; 44(W1) W581-6
[http://dx.doi.org/10.1093/nar/gkw211] [PMID: 27084938]
[41]
Zhang Y, Lv J, Liu H, et al. HHMD: the human histone modification database. Nucleic Acids Res 2010; 38(Suppl. 1): D149-54.
[http://dx.doi.org/10.1093/nar/gkp968] [PMID: 19892823]
[42]
De Majo F, Calore M. Chromatin remodelling and epigenetic state regulation by non-coding RNAs in the diseased heart. Noncoding RNA Res 2018; 3(1): 20-8.
[43]
Zhang R, Zhang L, Yu W. Genome-wide expression of non-coding RNA and global chromatin modification. Acta Biochim Biophys Sin 2012; 44(1): 40-7.
[http://dx.doi.org/10.1093/abbs/gmr112] [PMID: 22194012]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 15
ISSUE: 4
Year: 2020
Published on: 11 June, 2020
Page: [338 - 348]
Pages: 11
DOI: 10.2174/1574893614666191105160633
Price: $65

Article Metrics

PDF: 18
HTML: 1