Background: The rapidly growing protein and annotation databases necessitate the development
of efficient tools to process this valuable information. Biologists frequently need to
find proteins similar to a given protein, for which BLAST tools are commonly used. With the development
of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure
function (semantic) similarity between two proteins. These methods work well on protein pairs,
but are not suitable for protein query processing.
Objective: Our aim is to facilitate searching of similar proteins in an acceptable time.
Methods: A novel method SimExact for high speed searching of functionally similar proteins has
Results: The experiments of this study show that SimExact gives correct results required for protein
searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php)
has been provided that generates a ranked list of the proteins similar to a query protein, with a response
time of less than 20 seconds in our setup. SimExact was used to search for protein pairs
having high disparity between function similarity and sequence similarity.
Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable