Predicting False Positives of Protein-Protein Interaction Data by Semantic Similarity Measures§
Recent technical advances in identifying protein-protein interactions (PPIs) have generated the genomic-wide
interaction data, collectively collectively referred to as the interactome. These interaction data give an insight into the
underlying mechanisms of biological processes. However, the PPI data determined by experimental and computational
methods include an extremely large number of false positives which are not confirmed to occur in vivo. Filtering PPI data
is thus a critical preprocessing step to improve analysis accuracy. Integrating Gene Ontology (GO) data is proposed in this
article to assess reliability of the PPIs. We evaluate the performance of various semantic similarity measures in terms of
functional consistency. Protein pairs with high semantic similarity are considered highly likely to share common
functions, and therefore, are more likely to interact. We also propose a combined method of semantic similarity to apply
to predicting false positive PPIs. The experimental results show that the combined hybrid method has better performance
than the individual semantic similarity classifiers. The proposed classifier predicted that 58.6% of the S. cerevisiae PPIs
from the BioGRID database are false positives.
Keywords: Gene ontology, protein-protein interactions, semantic similarity.
Rights & PermissionsPrintExport