Abstract
Protein-protein interactions (PPIs) play a key role in many cellular processes. Uncovering the PPIs and their function within the cell is a challenge of post-genomic biology and will improve our understanding of disease and help in the development of novel methods for disease diagnosis and forensics. The experimental methods currently used to identify PPIs are both time-consuming and expensive, and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. These obstacles could be overcome by developing computational approaches to predict PPIs and validate the obtained experimental results. In this work, we will describe the recent advances in predicting protein-protein interaction from the following aspects: i) the benchmark dataset construction, ii) the sequence representation approaches, iii) the common machine learning algorithms, and iv) the cross-validation test methods and assessment metrics.
Keywords: Protein-protein interaction, prediction, dataset construction, sequence representation, machine learning, crossvalidation test.