Background: DNase I hypersensitive sites (DHSs) are important signs of DNA regulatory
regions. Their identification in DNA sequences is significant for both the biomedical research and the
discovery of new drugs. The existing experimental methods to achieve this, however, are timeconsuming
and laborious, so new computational means are called for.
Method: To meet this end, a novel predictive model, called iDHSs-PseTNC, was constructed by integrating
the sequence-order information and the physicochemical properties of trinucleotides into the
pseudo trinucleotide composition (PseTNC). In the model, the deep sparse auto-encoder was used for
reconstructing the input to get a good representative of the input characteristic, and a softmax classifier
was added to the top of the auto-encoder coding layer. The deep sparse auto-encoder model obtained
the best classification result with each member of the training set correctly classified. Five-fold crossvalidation
test results indicated that the new predictor remarkably outperformed the existing prediction
methods for the same purpose.
Results: In this paper, the ACC rate of iDHSs-PseTNC is slightly (0.3%) lower than that of iDHS-EL
constructed by Liu et al., its MCC rate is 3.45% higher than that of iDHS-EL. And the predictor
iDHSs-PseTNC achieves the highest successful rates in both Pt and Py among the existing predictors.
In order to facilitate the direct derivation of the needed results by experimental scholars, an easy-to-use
web-server for identifying DHSs has been established for free access at: http://www.jcibioinfo.
cn/iDHSs-PseTNC, which allows for fast and accurate computation.
Conclusion: The timely identification of the DHSs in DNA sequence is significant for the intensive
study on DNA function and the development of new drugs. In this article, we proposed a novel method
for predicting the DHSs of DNA by incorporating physicochemical properties of trinucleotides into
pseudo trinucleotide composition via deep sparse auto-encoder. The results were promising enough for
our predictor to be used as an analytic solution to more genomic problems.