Backgrounds: With the advent of the post genomic era, the research for the genetic
mechanism of the diseases has found to be increasingly depended on the studies of the genes, the
gene-networks and gene-protein interaction networks. To explore gene expression and regulation, the
researchers have carried out many studies on transcription factors and their binding sites (TFBSs).
Based on the large amount of transcription factor binding sites predicting values in the deep learning
models, further computation and analysis have been done to reveal the relationship between the gene
mutation and the occurrence of the disease. It has been demonstrated that based on the deep learning
methods, the performances of the prediction for the functions of the noncoding variants are outperforming
than those of the conventional methods. The research on the prediction for functions of Single
Nucleotide Polymorphisms (SNPs) is expected to uncover the mechanism of the gene mutation
affection on traits and diseases of human beings.
Results: We reviewed the conventional TFBSs identification methods from different perspectives. As
for the deep learning methods to predict the TFBSs, we discussed the related problems, such as the
raw data preprocessing, the structure design of the deep convolution neural network (CNN) and the
model performance measure et al. And then we summarized the techniques that usually used in finding
out the functional noncoding variants from de novo sequence.
Conclusion: Along with the rapid development of the high-throughout assays, more and more sample
data and chromatin features would be conducive to improve the prediction accuracy of the deep convolution
neural network for TFBSs identification. Meanwhile, getting more insights into the deep
CNN framework itself has been proved useful for both the promotion on model performance and the
development for more suitable design to sample data. Based on the feature values predicted by the
deep CNN model, the prioritization model for functional noncoding variants would contribute to reveal
the affection of gene mutation on the diseases.