Improving Self-interacting Proteins Prediction Accuracy Using Protein Evolutionary Information and Weighed-Extreme Learning Machine

Author(s): Ji-Yong An*, Yong Zhou, Lei Zhang, Qiang Niu, Da-Fu Wang

Journal Name: Current Bioinformatics

Volume 14 , Issue 2 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: Self Interacting Proteins (SIPs) play an essential role in various aspects of the structural and functional organization of the cell.

Objective: In the study, we presented a novelty sequence-based computational approach for predicting Self-interacting proteins using Weighed-Extreme Learning Machine (WELM) model combined with an Autocorrelation (AC) descriptor protein feature representation.

Method: The major advantage of the proposed method mainly lies in adopting an effective feature extraction method to represent candidate self-interacting proteins by using the evolutionary information embedded in PSI-BLAST-constructed Position Specific Scoring Matrix (PSSM); and then employing a reliable and effective WELM classifier to perform classify.

Result: In order to evaluate the performance, the proposed approach is applied to yeast and human SIP datasets. The experimental results show that our method obtained 93.43% and 98.15% prediction accuracies on yeast and human dataset, respectively. Extensive experiments are carried out to compare our approach with the SVM classifier and existing sequence-based method on yeast and human dataset. Experimental results show that the performance of our method is better than several other state-of-theart methods.

Conclusion: It is demonstrated that the proposed method is suitable for SIPs detection and can execute incredibly well for identifying Sips. In order to facilitate extensive studies for future proteomics research, we developed a freely available web server called WELM-AC-SIPs in Hypertext Preprocessor (PHP) for predicting SIPs. The web server including source code and the datasets are available at

Keywords: SIPs, weighed-extreme learning machine, PSSM, Autocorrelation (AC) descriptor, PCA, protein sequence.

Liu Z, Guo F, Zhang J, et al. Proteome-wide Prediction of Self-interacting Proteins Based on Multiple Properties. Mol Cell Proteomics 2013; 12(6): 1689-700.
Baisamy L, Jurisch N, Diviani D. Leucine zipper-mediated homo-oligomerization regulates the Rho-GEF activity of AKAP-Lbc. J Biol Chem 2005; 280: 15405-12.
Hattori T, Ohoka N, Inoue Y, Hayashi H, Onozaki K. C/EBP family transcription factors are degraded by the proteasome but stabilized by forming dimer. Oncogene 2003; 22: 1273-80.
Katsamba P, Carroll K, Ahlsen G, et al. Linking molecular affinity and cellular specificity in cadherin-mediated adhesion. Proc Natl Acad Sci USA 2009; 106: 11594-9.
Koike R, Kidera A, Ota M. Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold. Protein Sci 2009; 18: 2060-6.
Woodcock JM, Murphy J, Stomski FC, Berndt MC, Lopez AF. The dimeric versus monomeric status of 14-3-3zeta is controlled by phosphorylation of Ser58 at the dimer interface. J Biol Chem 2003; 278: 36323-7.
Marianayagam NJ, Sunde M, Matthews JM. The power of two: protein dimerization in biology. Trends Biochem Sci 2004; 29: 618-25.
Ben-Hur A, Noble WS. Kernel methods for predicting protein-protein interactions. Bioinformatics 2005; 21(Suppl. 1): i38-46.
Shen J, Zhang J, Luo X, et al. Predicting protein-protein interactions based only on sequences information. Proc Natl Acad Sci USA 2007; 104: 4337-41.
Yang L, Xia JF, Gui J. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein Pept Lett 2010; 17: 1085-90.
Huang YA, You ZH, Gao X, Wong L, Wang L. Using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence. BioMed Res Int 2015; 2015: 902198.
You ZH, Chan KCC, Hu P. Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest. PLoS One 2015; 10: e0125811.
Consortium UP. UniProt: a hub for protein information. Nucleic Acids Res 2014; 43: D204-12.
Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D. DIP: the database of interacting proteins. Nucleic Acids Res 2004; 32: D449.
Livstone MS, Breitkreutz BJ, Stark C, et al. The BioGRID Interaction Database. Nucleic Acids Res 2011; 41: D637-40.
Orchard S, Ammari M, Aranda B, et al. The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 2014; 42: 358-63.
Breuer K, Foroushani AK, Laird MR, et al. InnateDB: Systems biology of innate immunity and beyond - Recent updates and continuing curation. Nucleic Acids Res 2013; 41: D1228-33.
Launay G, Salza R, Multedo D, Thierrymieg N, Ricardblum S. MatrixDB, the extracellular matrix interaction database: updated content, a new navigator and expanded functionalities. Nucleic Acids Res 2014; 43: 321-7.
Gribskov M, Mclachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci USA 1987; 84: 4355-8.
Guo Y, Li M, Lu M, Wen Z, Huang Z. Predicting G-protein coupled receptors-G-protein coupling specificity based on autocross-covariance transform. Proteins-strucFunc Bioinform 2006; 65: 55-60.
Lapinsh M, Gutcaits A, Prusis P, Post C, Lundstedt T, Wikberg JE. Classification of G-protein coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences. Protein Sci 2002; 11: 795-805.
Lin Z, Pan XM. Accurate prediction of protein secondary structural content. J Protein Chem 2001; 20: 217-20.
Zhang CT, Lin ZS, Zhang Z, Yan M. Prediction of the helix/strand content of globular proteins based on their primary sequences. Protein Eng 1998; 11: 971-9.
Zong W, Huang GB, Chen Y. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013; 101: 229-42.
Huang GB, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern B Cybern 2012; 42: 513.
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2011; 2: 389-96.
Du X, Cheng J, Zheng T, Duan Z, Qian F. A Novel Feature Extraction Scheme with Ensemble Coding for Protein–Protein Interaction Prediction. Int J Mol Sci 2014; 15: 12731-49.
Zahiri J, Yaghoubi O, Mohammad-Noori M, Ebrahimpour R, Masoudi-Nejad A. PPIevo: Protein-Protein Interaction Prediction from PSSM Based Evolutionary Information. Genomics 2013; 102: 237-42.
Zahiri J, Mohammad-Noori M, Ebrahimpour R, et al. LocFuse: Human protein–protein interaction prediction via classifier fusion using protein localization information. Genomics 2014; 104: 496-503.
Liu X, Yang S, Li C, Zhang Z, Song J. SPAR: a random forest-based predictor for self-interacting proteins with fine-grained domain information. Amino Acids 2016; 48: 1655-65.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [115 - 122]
Pages: 8
DOI: 10.2174/1574893613666180209161152

Article Metrics

PDF: 43
PRC: 1