Mining Protein-Protein Interaction Data

Ryan   J.   Haasl; Jianwen      Fang

Abstract

The development of high-throughput technologies that expedite the discovery of interactions between proteins has made it possible to screen entire genomes and produce large protein-protein interaction (PPI) datasets. The availability of these datasets is now enabling researchers to perform PPI data mining activities of theoretical and practical importance, including prediction of novel PPIs and protein function, sub-cellular localization of proteins, and construction of reasonably realistic, proteome-wide PPI networks. Most newer methods of in silico PPI prediction hinge upon conserved sequence signatures discovered through the analysis of a large PPI dataset, although some methods attempt to improve predictive accuracy through the incorporation of additional biological information and/or multiple datasets. Though the protein interaction networks constructed to date do not provide a truly realistic picture of biological network mechanisms, they are functional in the sense that they have enabled researchers to test the reliability of high-throughput data, predict protein function, and localize proteins within the cell. All PPI data mining activities are constrained by the quantity and quality of the PPI data currently available. Consequently, the reliability of predictions based on PPI data is expected to increase as PPI databases increase in size and taxonomic range.

Keywords: Protein-protein interaction, data mining, proteomics, in silico prediction, protein network

« Previous Next »