Current Bioinformatics

Yi-Ping Phoebe Chen
Department of Computer Science and Information Technology
La Trobe University
Melbourne
Australia

Back

Hybrid Approach Using SVM and MM2 in Splice Site Junction Identification

Author(s): Srabanti Maji, Deepak Garg.

Abstract:

Prediction of coding region from genomic DNA sequence is the foremost step in the quest of gene identification. In the eukaryotic organism, the gene structure consists of promoter, intron, start codon, exon and stop codon, etc. In the prediction of splice site, which is the separation between exons and introns, the accuracy is lower than 90% even when the sequences adjacent to the splice sites have a high conservation. Therefore, the algorithms used in the splice sites identification must be improved in order to recover the prediction accuracy. Hence, an efficient method, MM2F-SVM is proposed through this article, which consists of three stages – initial stage, in which a second order Markov Model (MM2) is used, i.e. feature extraction; intermediate, or the second stage in which principal feature analysis (PFA) is done, i.e. feature selection; and the final or the third stage, in which a support vector machine (SVM) with Gaussian kernel is used for final classification. While comparing this proposed MM2F-SVM model with the other existing splice site prediction programs, superior performance for the former has been noticed.

Keywords: Gene identification, markov models, principal feature analysis, splicing site, support vector machine.

Order Reprints Order Eprints Rights & PermissionsPrintExport

Article Details

VOLUME: 9
ISSUE: 1
Year: 2014
Page: [76 - 85]
Pages: 10
DOI: 10.2174/1574893608999140109121721