Classification of Chromosomal DNA Sequences Using Hybrid Deep Learning Architectures

(E-pub Ahead of Print)

Author(s): Zhihua Du, Xiangdong Xiao, Vladimir N. Uversky*

Journal Name: Current Bioinformatics

Become EABM
Become Reviewer

Abstract:

Chromosomal DNA contains most of the genetic information of eukaryotes and plays an important role in the growth, development and reproduction of living organisms. Most chromosomal DNA sequences are known to wrap around histones, and distinguishing these DNA sequences from ordinary DNA sequences is important for understanding the genetic code of life. The main difficulty behind this problem is the feature selection process. DNA sequences have no explicit features, and the common representation methods, such as one-hot coding, introduced the major drawback of high dimensionality. Recently, deep learning models have been proved to be able to automatically extract useful features from input patterns. In this paper, we present four different deep learning architectures using convolutional neural networks and long short-term memory networks for the purpose of chromosomal DNA sequence classification. Natural language model(Word2vec)was used to generate word embedding of sequence and learn features from it by deep learning. The comparison of these four architectures is carried out on 10 chromosomal DNA datasets. The results show that the architecture of convolutional neural networks combined with long short-term memory networks is superior to other methods in accuracy of chromosomal DNA prediction.

Keywords: Convolutional neural network (CNN), Long short-term memory network (LSTM), DNA sequence classification

Rights & PermissionsPrintExport Cite as

Article Details

(E-pub Ahead of Print)
DOI: 10.2174/1574893615666200224095531
Price: $95