Background: Sphingomonas is a kind of microbial resources used for biodegradation of aromatic
compounds. In computational biology, identifying protein coding domains in Sphingomonas
genome is known as a challenging problem.
Objective: In this work, to address the challenge, we propose a novel method to predict protein coding
regions from Sphingomonas genome by 3-base periodicity.
Method: In our method, DNA sequences are firstly transformed into wavelet by a so-called 3-base
characteristics strategy. After that, sliding windows with certain fixed lengths are developed to identify
protein coding regions, in which the initial size of sliding windows and values of thresholds are set by
experimentally verified protein data in NCBI library.
Results: As results, an experimental verified protein coding domain in congeneric families of Sphingomonas
is identified from Sphingomonas genome.
Conclusion: This would be with high possibility to encode the similar functioning proteins. As well,
some potential protein coding regions are marked by narrowing the forecast areas, and then an extensible
sliding window strategy is used to improve predictive accuracy.