Promoter region of a gene sequence of Eukaryotes is very important as it helps us to understand the mechanism
of transcription regulation. The identification of this region is a complex problem as the signature for identification turns
out to be fuzzy. Several in silico methods are available for identifying the promoter region, but the scope for new methods
still exists. Reasonable prediction of promoter sequence (that can be tested by comparing with the wet-lab data) from a
mixed database of promoters and nonpromoters is thus a challenge that any new method would have to face. In this
communication we propose a composite method that utilizes clustering of known promoter and non-promoter sequences
in their respective clusters based on their relative distances, and then classifying the max similarity scores obtained from a
group of new sequences and the clusters, to predict the true promoters among the new set of sequences. The in silico
experiment is carried out on different databases constructed by us from the available primary sequence databanks to
demonstrate the advantage of the proposed approach.
Keywords: Distance matrix, fuzzy C-means clustering, human promoter region, multilayer perceptron.
Rights & PermissionsPrintExport