An Integrated Prediction Method for Identifying Protein-Protein Interactions

(E-pub Ahead of Print)

Author(s): Chang Xu, Limin Jiang, Zehua Zhang, Xuyao Yu, Renhai Chen*, Junhai Xu*.

Journal Name: Current Proteomics

Submit Manuscript
Submit Proposal

Abstract:

Background: Protein-protein interactions (PPIs) play a key role in various biological processes. Many methods have been developed to predict protein-protein interactions and protein interaction networks. However, many existing applications are limited, because of relying on a large number of homology proteins and interaction marks.

Method: In this paper, we propose a novel integrated learning approach (RF-Ada-DF) with the sequence-based feature representation, for identifying protein-protein interactions. Our method firstly constructs a sequence-based feature vector to represent each pair of proteins, via Multivariate Mutual Information (MMI) and Normalized Moreau-Broto Autocorrelation (NMBAC). Then, we feed the 638-dimentional features into an integrated learning model for judging interaction pairs and non-interaction pairs.

Result: Furthermore, this integrated model embeds Random Forest in AdaBoost framework and turns weak classifiers into a single strong classifier. Meanwhile, we also employ double fault detection in order to suppress over-adaptation during the training process. To evaluate the performance of our method, we conduct several comprehensive tests for PPIs prediction. On the Heli pylori dataset, our method achieves 88.16% accuracy and 87.68% sensitivity, the accuracy of our method is increased by 0.57%. On the S.cerevisiae dataset, our method achieves 95.77% accuracy and 93.36% sensitivity, the accuracy of our method is increased by 0.76%. On the Human dataset, our method achieves 98.16% accuracy and 96.80% sensitivity, the accuracy of our method is increased by 0.6%.

Conclusion: Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. The datasets and codes are available at https://github.com/guofei-tju/RF-Ada-DF.git

Keywords: Protein-protein interaction, Multivariate mutual information, Random Forest, AdaBoost framework, Double fault detection, Computational Intelligence

Rights & PermissionsPrintExport Cite as


Article Details

(E-pub Ahead of Print)
DOI: 10.2174/1570164616666190306152318
Price: $95