SimExact – An Efficient Method to Compute Function Similarity Between Proteins Using Gene Ontology

Author(s): Najmul Ikram*, Muhammad Abdul Qadir, Muhammad Tanvir Afzal

Journal Name: Current Bioinformatics

Volume 15 , Issue 4 , 2020

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: The rapidly growing protein and annotation databases necessitate the development of efficient tools to process this valuable information. Biologists frequently need to find proteins similar to a given protein, for which BLAST tools are commonly used. With the development of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure function (semantic) similarity between two proteins. These methods work well on protein pairs, but are not suitable for protein query processing.

Objective: Our aim is to facilitate searching of similar proteins in an acceptable time.

Methods: A novel method SimExact for high speed searching of functionally similar proteins has been proposed.

Results: The experiments of this study show that SimExact gives correct results required for protein searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php) has been provided that generates a ranked list of the proteins similar to a query protein, with a response time of less than 20 seconds in our setup. SimExact was used to search for protein pairs having high disparity between function similarity and sequence similarity.

Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable time otherwise.

Keywords: SimExact, protein function similarity, protein query, protein similarity measures, gene ontology, protein.

[1]
Chang JT, Raychaudhuri S, Altman RB. Including biological literature improves homology search. Pac Symp Biocomput 2001; 374-83.
[PMID: 11262956]
[2]
MacCallum RM, Kelley LA, Sternberg MJ. SAWTED: structure assignment with text description--enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics 2000; 16(2): 125-9.
[http://dx.doi.org/10.1093/bioinformatics/16.2.125] [PMID: 10842733]
[3]
Couto FM, Silva MJ, Coutinho PM. Measuring semantic similarity between Gene Ontology terms. Data Knowl Eng 2007; 61: 137-52.
[http://dx.doi.org/10.1016/j.datak.2006.05.003]
[4]
Lord PW, Stevens RD, Brass A, Goble CA. Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 2003; 19(10): 1275-83.
[http://dx.doi.org/10.1093/bioinformatics/btg153] [PMID: 12835272]
[5]
Resnik P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence. 1995.
[6]
Jiang J, Conrath D. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy Proceedings of the 10th International Conference on Research on Computational Linguistics. 1997.
[7]
Lin D. An Information-Theoretic Definition of Similarity. Proceedings of the 15th International Conference on Machine Learning. 1998; 296-304.
[8]
Devos D, Valencia A. Intrinsic errors in genome annotation. Trends Genet 2001; 17(8): 429-31.
[http://dx.doi.org/10.1016/S0168-9525(01)02348-4] [PMID: 11485799]
[9]
Gentleman R. Visualizing and distances using GO. Available from:. www.bioconductor.org/docs/vignettes.html
[10]
Pesquita C, Faria D, Bastos H, Falcão A, Couto F. Evaluating GO-based semantic similarity measures. Proc 10th Annual Bio-Ontologies Meeting. 2007; 37-40.
[11]
Couto FM, Silva MJ, Coutinho P. Measuring semantic similarity between gene ontology terms. Data Knowl Eng 2006; 61(1): 137-52.
[http://dx.doi.org/10.1016/j.datak.2006.05.003]
[12]
Couto FM, Silva MJ. Disjunctive shared information between ontology concepts: application to Gene Ontology. J Biomed Semantics 2011; 2: 5.
[http://dx.doi.org/10.1186/2041-1480-2-5] [PMID: 21884591]
[13]
Wang J, Zhou X, Zhu J, Zhou C, Guo Z. Revealing and avoiding bias in semantic similarity scores for protein pairs. BMC Bioinformatics 2010; 11: 290.
[http://dx.doi.org/10.1186/1471-2105-11-290] [PMID: 20509916]
[14]
Song X, Li L, Srimani PK, Yu PS, Wang JZ. Measure the semantic similarity of GO terms using aggregate information content. IEEE/ACM Trans Comput Biol Bioinformatics 2014; 11(3): 468-76.
[http://dx.doi.org/10.1109/TCBB.2013.176] [PMID: 26356015]
[15]
Alvarez MA, Yan C. A graph-based semantic similarity measure for the gene ontology. J Bioinform Comput Biol 2011; 9(6): 681-95.
[http://dx.doi.org/10.1142/S0219720011005641] [PMID: 22084008]
[16]
Peng J, Wang Y, Chen J. Towards integrative gene functional similarity measurement. BMC Bioinformatics 2014; 15(Suppl. 2): S5.
[http://dx.doi.org/10.1186/1471-2105-15-S2-S5] [PMID: 24564710]
[17]
Mazandu GK, Mulder NJ. Information content-based gene ontology semantic similarity approaches: toward a unified framework theory Hindawi Publishing Corporation. BioMed Res Int 2013. Article ID: 292063
[18]
Yu G, Li F, Qin Y, Bo X, Wu Y, Wang S. GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 2010; 26(7): 976-8.
[http://dx.doi.org/10.1093/bioinformatics/btq064] [PMID: 20179076]
[19]
Seco N, Veale T, Hayes J. An Intrinsic Information Content Metric for Semantic Similarity in WordNet. ECAI’04: Proceedings of the 16th European Conference on Artificial Intelligence. 2004; 1089-90.
[20]
Bien SJ, Park CH, Shim HJ, Yang W, Kim J, Kim JH. Bi-directional semantic similarity for gene ontology to optimize biological and clinical analyses. J Am Med Inform Assoc 2012; 19(5): 765-74.
[http://dx.doi.org/10.1136/amiajnl-2011-000659] [PMID: 22374934]
[21]
Wu X, Pang E, Lin K, Pei ZM. Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS One 2013; 8(5) e66745
[http://dx.doi.org/10.1371/journal.pone.0066745] [PMID: 23741529]
[22]
Pesquita C, Pessoa D, Faria D, Couto F. Collaborative Evaluation of Semantic Similarity Measures. Challenges in Bioinformatics 2009.
[23]
Ikram N, Qadir MA, Afzal MT. Investigating correlation between protein sequence similarity and semantic similarity using gene ontology annotations. IEEE/ACM Trans Comput Biol Bioinformatics 2018; 15(3): 905-12.
[http://dx.doi.org/10.1109/TCBB.2017.2695542] [PMID: 28436885]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 15
ISSUE: 4
Year: 2020
Page: [318 - 327]
Pages: 10
DOI: 10.2174/1574893614666191017092842
Price: $65

Article Metrics

PDF: 11
HTML: 1