ConvsPPIS: Identifying Protein-protein Interaction Sites by an Ensemble Convolutional Neural Network with Feature Graph

Author(s): Huaixu Zhu, Xiuquan Du*, Yu Yao

Journal Name: Current Bioinformatics

Volume 15 , Issue 4 , 2020


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Background/Objective: Protein-protein interactions are essentials for most cellular processes and thus, unveiling how proteins interact with is a crucial question that can be better understood by recognizing which residues participate in the interaction. Although many computational approaches have been proposed to predict interface residues, their feature perspective and model learning ability are not enough to achieve ideal results. So, our objective is to improve the predictive performance under considering feature perspective and new learning algorithm.

Method: In this study, we proposed an ensemble deep convolutional neural network, which explores the context and positional context of consecutive residues within a protein sub-sequence. Specifically, unlike the feature view of previous methods, ConvsPPIS uses evolutionary, physicochemical, and structural protein characteristics to construct their own feature graph respectively. After that, three independent deep convolutional neural networks are trained on each type of feature graph for learning the underlying pattern in sub-sequence. Lastly, we integrated those three deep networks into an ensemble predictor with leveraging complementary information of those features to predict potential interface residues.

Results: Some comparative experiments have conducted through 10-fold cross-validation. The results indicated that ConvsPPIS achieved superior performance on DBv5-Sel dataset with an accuracy of 88%. Additional experiments on CAPRI-Alone dataset demonstrated ConvsPPIS has also better prediction performance.

Conclusion: The ConvsPPIS method provided a new perspective to capture protein feature expression for identifying protein-protein interaction sites. The results proved the superiority of this method.

Keywords: Feature graph, positional context, protein complex, interface prediction, convolution neural network, ensemble learning.

[1]
Alberts B, Bary D, Lewis J, Raff M, Roberts K, Watson JD. Molecular Biology of the cell. New York: Garland. 2nd ed. Elsevier: 1989.
[2]
Hou Q, De Geest PFG, Vranken WF, Heringa J, Feenstra KA. Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest. Bioinformatics 2017; 33(10): 1479-87.
[http://dx.doi.org/10.1093/bioinformatics/btx005] [PMID: 28073761]
[3]
Minhas Fu, Geiss BJ, Ben-Hur A. PAIRpred: partner-specific prediction of interacting residues from sequence and structure. Proteins 2014; 82(7): 1142-55.
[http://dx.doi.org/10.1002/prot.24479] [PMID: 24243399]
[4]
Gallet X, Charloteaux B, Thomas A, Brasseur R. A fast method to predict protein interaction sites from sequences. J Mol Biol 2000; 302(4): 917-26.
[http://dx.doi.org/10.1006/jmbi.2000.4092] [PMID: 10993732]
[5]
Ofran Y, Rost B. Predicted protein-protein interaction sites from local sequence information. FEBS Lett 2003; 544(1-3): 236-9.
[http://dx.doi.org/10.1016/S0014-5793(03)00456-3] [PMID: 12782323]
[6]
Yan C, Dobbs D, Honavar V. A two-stage classifier for identification of protein-protein interface residues. Bioinformatics 2004; 20(Suppl. 1): i371-8.
[http://dx.doi.org/10.1093/bioinformatics/bth920]
[7]
Reš I, Mihalek I, Lichtarge O. An evolution based classifier for prediction of protein interfaces without using protein structures. Bioinformatics 2005; 21(10): 2496-501.
[http://dx.doi.org/10.1093/bioinformatics/bti340] [PMID: 15728113]
[8]
Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics 2007; 23(2): e13-6.
[http://dx.doi.org/10.1093/bioinformatics/btl303] [PMID: 17237081]
[9]
Jones S, Thornton JM. Prediction of protein-protein interaction sites using patch analysis. J Mol Biol 1997; 272(1): 133-43.
[http://dx.doi.org/10.1006/jmbi.1997.1233] [PMID: 9299343]
[10]
Sikić M, Tomić S, Vlahoviček K. Prediction of protein-protein interaction sites in sequences and 3D structures by random forests. PLOS Comput Biol 2009; 5(1) e1000278
[http://dx.doi.org/10.1371/journal.pcbi.1000278] [PMID: 19180183]
[11]
Savojardo C, Fariselli P, Martelli PL, Casadio R. ISPRED4: interaction sites PREDiction in protein structures with a refining grammar model. Bioinformatics 2017; 33(11): 1656-63.
[http://dx.doi.org/10.1093/bioinformatics/btx044] [PMID: 28130235]
[12]
Li B-Q, Feng K-Y, Chen L, Huang T, Cai Y-D. Prediction of protein-protein interaction sites by random forest algorithm with mRMR and IFS. PLoS One 2012; 7(8) e43927
[http://dx.doi.org/10.1371/journal.pone.0043927] [PMID: 22937126]
[13]
Zhou HX, Shan Y. Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 2001; 44(3): 336-43.
[http://dx.doi.org/10.1002/prot.1099] [PMID: 11455607]
[14]
Fariselli P, Pazos F, Valencia A, Casadio R. Prediction of protein--protein interaction sites in heterocomplexes with neural networks. Eur J Biochem 2002; 269(5): 1356-61.
[http://dx.doi.org/10.1046/j.1432-1033.2002.02767.x] [PMID: 11874449]
[15]
Deng L, Guan J, Dong Q, Zhou S. Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics 2009; 10(1): 426.
[http://dx.doi.org/10.1186/1471-2105-10-426] [PMID: 20015386]
[16]
Chen XW, Jeong JC. Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009; 25(5): 585-91.
[http://dx.doi.org/10.1093/bioinformatics/btp039] [PMID: 19153136]
[17]
Murakami Y, Mizuguchi K. Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 2010; 26(15): 1841-8.
[http://dx.doi.org/10.1093/bioinformatics/btq302] [PMID: 20529890]
[18]
Northey TC, Barešić A, Martin ACR. IntPred: a structure-based predictor of protein-protein interaction sites. Bioinformatics 2018; 34(2): 223-9.
[http://dx.doi.org/10.1093/bioinformatics/btx585] [PMID: 28968673]
[19]
Fariselli P, Savojardo C, Martelli PL, Casadio R. Grammatical-restrained hidden conditional random fields for bioinformatics applications. Algorithms Mol Biol 2009; 4(1): 13.
[http://dx.doi.org/10.1186/1748-7188-4-13] [PMID: 19849839]
[20]
Dhole K, Singh G, Pai PP, Mondal S. Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. J Theor Biol 2014; 348: 47-54.
[http://dx.doi.org/10.1016/j.jtbi.2014.01.028] [PMID: 24486250]
[21]
Wei Z-S, Han K, Yang J-Y, Shen H-B, Yu D-J. Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 2016; 193: 201-12.
[http://dx.doi.org/10.1016/j.neucom.2016.02.022]
[22]
Bradford JR, Westhead DR. Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005; 21(8): 1487-94.
[http://dx.doi.org/10.1093/bioinformatics/bti242] [PMID: 15613384]
[23]
Dohkan S, Koike A, Takagi T. Prediction of protein-protein interactions using support vector machines Proceedings Fourth IEEE Symposium on Bioinformatics and Bioengineering 2004; 576-83.
[24]
Dong Z, Wang K, Dang TKL, et al. CRF-based models of protein surfaces improve protein-protein interaction site predictions. BMC Bioinformatics 2014; 15(1): 277.
[http://dx.doi.org/10.1186/1471-2105-15-277] [PMID: 25124108]
[25]
Li M-H, Lin L, Wang X-L, Liu T. Protein-protein interaction site prediction based on conditional random fields. Bioinformatics 2007; 23(5): 597-604.
[http://dx.doi.org/10.1093/bioinformatics/btl660] [PMID: 17234636]
[26]
Chen H, Zhou HX. Prediction of interface residues in protein-protein complexes by a consensus neural network method: test against NMR data. Proteins 2005; 61(1): 21-35.
[http://dx.doi.org/10.1002/prot.20514] [PMID: 16080151]
[27]
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 2015; 33(8): 831-8.
[http://dx.doi.org/10.1038/nbt.3300] [PMID: 26213851]
[28]
Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016; 26(7): 990-9.
[http://dx.doi.org/10.1101/gr.200535.115] [PMID: 27197224]
[29]
Vreven T, Moal IH, Vangone A, et al. Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. J Mol Biol 2015; 427(19): 3031-41.
[http://dx.doi.org/10.1016/j.jmb.2015.07.016] [PMID: 26231283]
[30]
de Vries SJ, Bonvin AM. How proteins get in touch: interface prediction in the study of biomolecular complexes. Curr Protein Pept Sci 2008; 9(4): 394-406.
[http://dx.doi.org/10.2174/138920308785132712] [PMID: 18691126]
[31]
Mihel J, Sikić M, Tomić S, Jeren B, Vlahoviček K. PSAIA - protein structure and interaction analyzer. BMC Struct Biol 2008; 8(1): 21.
[http://dx.doi.org/10.1186/1472-6807-8-21] [PMID: 18400099]
[32]
Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 1987; 84(13): 4355-8.
[http://dx.doi.org/10.1073/pnas.84.13.4355] [PMID: 3474607]
[33]
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[34]
Kim H, Park H. Protein secondary structure prediction based on an improved support vector machines approach. Protein Eng 2003; 16(8): 553-60.
[http://dx.doi.org/10.1093/protein/gzg072] [PMID: 12968073]
[35]
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10(1): 7.
[http://dx.doi.org/10.1186/s13015-015-0033-9] [PMID: 25713596]
[36]
Kidera A, Konishi Y, Oka M, Ooi T, Scheraga HA. Statistical analysis of the physical properties of the 20 naturally occurring amino acids. J Protein Chem 1985; 4(1): 23-55.
[http://dx.doi.org/10.1007/BF01025492]
[37]
Bengio Y. Learning deep architectures for AI Foundations and trends® in Machine Learning. 2009; 2(1): 1-127.
[http://dx.doi.org/10.1561/9781601982957]
[38]
Keras, GitHub repository 2015. Available at. https://github.com/fchollet/keras
[39]
Tharwat A. Classification assessment methods. Applied Comput Informatics 2018.
[http://dx.doi.org/10.1016/j.aci.2018.08.003]
[40]
Zhang QC, Deng L, Fisher M, Guan J, Honig B, Petrey D. PredUs: a web server for predicting protein interfaces using structural neighbors. Nucleic Acids Res 2011; 39(Suppl. 2). W283-7
[41]
Porollo A, Meller J. Prediction-based fingerprints of protein-protein interactions. Proteins 2007; 66(3): 630-45.
[http://dx.doi.org/10.1002/prot.21248] [PMID: 17152079]
[42]
Li N, Sun Z, Jiang F. Prediction of protein-protein binding site by using core interface residue and support vector machine. BMC Bioinformatics 2008; 9(1): 553.
[http://dx.doi.org/10.1186/1471-2105-9-553] [PMID: 19102736]
[43]
Qin S, Zhou H-X. meta-PPISP: a meta web server for protein-protein interaction site prediction. Bioinformatics 2007; 23(24): 3386-7.
[http://dx.doi.org/10.1093/bioinformatics/btm434] [PMID: 17895276]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 15
ISSUE: 4
Year: 2020
Published on: 05 November, 2019
Page: [368 - 378]
Pages: 11
DOI: 10.2174/1574893614666191105155713
Price: $65

Article Metrics

PDF: 22
HTML: 1