Background: In sequencing human and model organism DNA, the development of efficient computational techniques for the rapid prediction of short exons in eukaryotes is a major challenge.
Objective: This paper presents a multiscale products-based method in B-spline wavelet domain for short exon detection. In our analysis, we find out the wavelet coefficients associated to introns are less correlated between consecutive scales than coefficients related to exons. We reveal the explanation of this investigation which results from the HMR195 dataset by calculating the histogram distributions of the exon and intron coefficients. We employ these inter-scale correlation features to enhance exon structures and weak background noise.
Method: The development of our method is outlined at two stages: (i) A new B-spline wavelet transform is designed to extract the exon features in multiscale domain; so, setting the window length parameter which affects the results is avoided, and this wavelet has higher degree of freedom for curve design. (ii) Based on the significant difference of correlated features between the exon and intron coefficients, we present a multiscale products-based method to discriminate significant exon features from introns.
Results: The BG570 and HMR195 datasets have been used in the evaluation of considered methods. By comparison with eight other existing techniques, the detection results show that: the proposed method reveals at least improvement of 26.8%, 9.5%, 8.2%, 3.5%, 10.2%, 4.5%, 7.8% and 6.4% on the exons length of 0-24, 25-49, 50-74, 100-124, 125-149, 150-174, 175-199 and 200-299, respectively.
Conclusion: Experimental results demonstrate that our approach leads to better performance for short exon detection.