Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences and are usually located far from transcription start sites. Like other regulatory elements, the regions around enhancers contain a variety of features.
Objective: The above features are widely used to predict the position of enhancers in existing algorithms. And the accuracies of these methods are significant affected by the selected features. Thus, it is urgent to filter the important features out, which can greatly help for enhancer recognition.
Method: To evaluate the classification power of these features for enhancer recognition, all of the features were divided into three categories: sequence features, transcriptional features, and epigenetic features. Here, we presented two evaluation methods involving information gain and single feature prediction accuracy. The information gain can effectively reflect the entropy change of enhancer recognition using different features. Single feature prediction accuracy can directly reflect the contribution of features for enhancers recognition.
Results: The average information gain of the sequence feature, transcriptional feature and epigenetic feature is 0.068, 0.213, and 0.299, respectively. The average AUC value corresponding to the sequence feature, transcriptional feature, and epigenetic feature is 0.534, 0.605, and 0.647, respectively.
Conclusion: In comparison with sequence features, epigenetic features are more effective for recognizing enhancers.