Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation

Author(s): Wangren Qiu, Chunhui Xu, Xuan Xiao, Dong Xu*.

Journal Name: Current Genomics

Volume 20 , Issue 5 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms.

Objective: To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites.

Methods: In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization.

Results: Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew’s correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew’s correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available.

Conclusion: Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at:

Keywords: Ubiquitination, machine learning, random forest, protein annotation, subcellular localization, functional domain.

Aguilar, R.C.; Wendland, B. Ubiquitin: Not just for proteasomes anymore. Curr. Opin. Cell Biol., 2003, 15(2), 184-190.
[] [PMID: 12648674]
Welchman, R.L.; Gordon, C.; Mayer, R.J. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat. Rev. Mol. Cell Biol., 2005, 6(8), 599-609.
[] [PMID: 16064136]
Haglund, K.; Dikic, I. Ubiquitylation and cell signaling. EMBO J., 2005, 24(19), 3353-3359.
[] [PMID: 16148945]
Hoeller, D.; Hecker, C.M.; Dikic, I. Ubiquitin and ubiquitin-like proteins in cancer pathogenesis. Nat. Rev. Cancer, 2006, 6(10), 776-788.
[] [PMID: 16990855]
Jadhav, T.; Wooten, M.W. Defining an embedded code for protein ubiquitination. J. Proteomics Bioinform., 2009, 2, 316.
[] [PMID: 20148194]
Reinstein, E.; Ciechanover, A. Narrative review: Protein degradation and human diseases: The ubiquitin connection. Ann. Intern. Med., 2006, 145(9), 676-684.
[] [PMID: 17088581]
Schwartz, A.L.; Ciechanover, A. The ubiquitin-proteasome pathway and pathogenesis of human diseases. Annu. Rev. Med., 1999, 50, 57-74.
[] [PMID: 10073263]
Iconomou, M.; Saunders, D.N. Systematic approaches to identify E3 ligase substrates. Biochem. J., 2016, 473(22), 4083-4101.
[] [PMID: 27834739]
Cai, B.; Jiang, X. Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences. BMC Bioinformatics, 2016, 17, 116.
[] [PMID: 26940649]
Cai, Y.; Jiang, X. Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences. BMC Bioinformatics, 2016, 17, 116.
[] [PMID: 26940649]
Chen, Z.; Zhou, Y.; Zhang, Z.; Song, J. Towards more accurate prediction of ubiquitination sites: A comprehensive review of current methods, tools and features. Brief. Bioinform., 2015, 16(4), 640-657.
[] [PMID: 25212598]
Radivojac, P.; Vacic, V.; Haynes, C.; Cocklin, R.R.; Mohan, A.; Heyen, J.W.; Goebl, M.G.; Iakoucheva, L.M. Identification, analysis, and prediction of protein ubiquitination sites. Proteins, 2010, 78(2), 365-380.
[] [PMID: 19722269]
Cai, Y.; Huang, T.; Hu, L.; Shi, X.; Xie, L.; Li, Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids, 2012, 42(4), 1387-1395.
[] [PMID: 21267749]
Zhao, X.; Li, X.; Ma, Z.; Yin, M. Prediction of lysine ubiquitylation with ensemble classifier and feature selection. Int. J. Mol. Sci., 2011, 12(12), 8347-8361.
[] [PMID: 22272076]
Chen, Z.; Chen, Y.Z.; Wang, X.F.; Wang, C.; Yan, R.X.; Zhang, Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One, 2011, 6(7)e22930
[] [PMID: 21829559]
Wang, D.; Liang, Y.; Xu, D. Capsule network for protein post-translational modification site prediction. Bioinformatics, 2019, 35(14), 2386-2394.
Chen, X.; Qiu, J.D.; Shi, S.P.; Suo, S.B.; Huang, S.Y.; Liang, R.P. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites. Bioinformatics, 2013, 29(13), 1614-1622.
[] [PMID: 23626001]
Huang, C.H.; Su, M.G.; Kao, H.J.; Jhong, J.H.; Weng, S.L.; Lee, T.Y. UbiSite: Incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC Syst. Biol., 2016, 10(Suppl. 1), 6.
[] [PMID: 26818456]
Li, X.; Gao, X.; Ren, J.; Jin, C.; Xue, Y. BDM-PUB: Computational prediction of protein ubiquitination sites with a Bayesian discriminant method, 2009.
Qiu, W.R.; Sun, B.Q.; Xiao, X.; Xu, D.; Chou, K.C. iPhos-PseEvo: Identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Mol. Inform., 2016, 36(5-6)
Chou, K.C. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics, 2009, 6(4), 262-274.
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 2001, 43(3), 246-255.
[] [PMID: 11288174]
Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.C. Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res., 2015, 43(W1), W65-W71.
[] [PMID: 25958395]
Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; Harris, M.A.; Hill, D.P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J.C.; Richardson, J.E.; Ringwald, M.; Rubin, G.M.; Sherlock, G. Consortium, G.O. The Gene Ontology Consortium. Gene ontology: Tool for the unification of biology. Nat. Genet., 2000, 25(1), 25-29.
[] [PMID: 10802651]
Jones, D.T. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics, 2007, 23(5), 538-544.
[] [PMID: 17237066]
The UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Res., 2017, 45(D1), D158-D169.
[] [PMID: 27899622]
Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22(13), 1658-1659.
[] [PMID: 16731699]
Nakai, K.; Horton, P. PSORT: A program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci., 1999, 24(1), 34-36.
[] [PMID: 10087920]
Harris, M.A.; Clark, J.; Ireland, A.; Lomax, J.; Ashburner, M.; Foulger, R.; Eilbeck, K.; Lewis, S.; Marshall, B.; Mungall, C.; Richter, J.; Rubin, G.M.; Blake, J.A.; Bult, C.; Dolan, M.; Drabkin, H.; Eppig, J.T.; Hill, D.P.; Ni, L.; Ringwald, M.; Balakrishnan, R.; Cherry, J.M.; Christie, K.R.; Costanzo, M.C.; Dwight, S.S.; Engel, S.; Fisk, D.G.; Hirschman, J.E.; Hong, E.L.; Nash, R.S.; Sethuraman, A.; Theesfeld, C.L.; Botstein, D.; Dolinski, K.; Feierbach, B.; Berardini, T.; Mundodi, S.; Rhee, S.Y.; Apweiler, R.; Barrell, D.; Camon, E.; Dimmer, E.; Lee, V.; Chisholm, R.; Gaudet, P.; Kibbe, W.; Kishore, R.; Schwarz, E.M.; Sternberg, P.; Gwinn, M.; Hannick, L.; Wortman, J.; Berriman, M.; Wood, V.; de la Cruz, N.; Tonellato, P.; Jaiswal, P.; Seigfried, T.; White, R.; Gene Ontology, C. The Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., 2004, 32(Database issue), D258-D261.
[PMID: 14681407]
Bateman, A.; Birney, E.; Durbin, R.; Eddy, S.R.; Finn, R.D.; Sonnhammer, E.L. Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res., 1999, 27(1), 260-262.
[] [PMID: 9847196]
Letunic, I.; Copley, R.R.; Schmidt, S.; Ciccarelli, F.D.; Doerks, T.; Schultz, J.; Ponting, C.P.; Bork, P. SMART 4.0: Towards genomic data integration. Nucleic Acids Res., 2004, 32(Database issue), D142-D144.
[] [PMID: 14681379]
Sigrist, C.J.; Cerutti, L.; de Castro, E.; Langendijk-Genevaux, P.S.; Bulliard, V.; Bairoch, A.; Hulo, N. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res., 2010, 38(Database issue), D161-D166.
[] [PMID: 19858104]
Pandit, S.B.; Bhadra, R.; Gowri, V.S.; Balaji, S.; Anand, B.; Srinivasan, N. SUPFAM: A database of sequence superfamilies of protein domains. BMC Bioinformatics, 2004, 5, 28.
[] [PMID: 15113407]
Hunter, S.; Apweiler, R.; Attwood, T.K.; Bairoch, A.; Bateman, A.; Binns, D.; Bork, P.; Das, U.; Daugherty, L.; Duquenne, L.; Finn, R.D.; Gough, J.; Haft, D.; Hulo, N.; Kahn, D.; Kelly, E.; Laugraud, A.; Letunic, I.; Lonsdale, D.; Lopez, R.; Madera, M.; Maslen, J.; McAnulla, C.; McDowall, J.; Mistry, J.; Mitchell, A.; Mulder, N.; Natale, D.; Orengo, C.; Quinn, A.F.; Selengut, J.D.; Sigrist, C.J.; Thimma, M.; Thomas, P.D.; Valentin, F.; Wilson, D.; Wu, C.H.; Yeats, C. InterPro: The integrative protein signature database. Nucleic Acids Res., 2009, 37(Database issue), D211-D215.
[] [PMID: 18940856]
Attwood, T.K.; Coletta, A.; Muirhead, G.; Pavlopoulou, A.; Philippou, P.B.; Popov, I.; Romá-Mateo, C.; Theodosiou, A.; Mitchell, A.L. The PRINTS database: A fine-grained protein sequence annotation and analysis resource--its status in 2012. Database (Oxford), 2012, 2012bas019
[] [PMID: 22508994]
Mcculloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol., 1990, 52(1-2), 99-115.
[] [PMID: 2185863]
Chou, K.C.; Elrod, D.W. Bioinformatical analysis of G-protein-coupled receptors. J. Proteome Res., 2002, 1(5), 429-433.
[] [PMID: 12645914]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn., 1995, 20(3), 273-297.
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory, 1967, 13(1), 21-27.
Ho, T.K. The random subspace method for constructing decision forests. IEEE T Pattern Anal., 1998, 20(8), 832-844.
Zhang, Z.H.; Wang, Z.H.; Zhang, Z.R.; Wang, Y.X. A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett., 2006, 580(26), 6169-6174.
[] [PMID: 17069811]
Xiao, X.; Lin, W.Z. Application of protein grey incidence degree measure to predict protein quaternary structural types. Amino Acids, 2009, 37(4), 741-749.
[] [PMID: 19037711]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[] [PMID: 21168420]
Chou, K.C.; Shen, H.B. MemType-2L: A web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem. Biophys. Res. Commun., 2007, 360(2), 339-345.
[] [PMID: 17586467]
Chou, K.C.; Shen, H.B. Recent progress in protein subcellular location prediction. Anal. Biochem., 2007, 370(1), 1-16.
[] [PMID: 17698024]
Chou, K.C. Structural bioinformatics and its impact to biomedical science. Curr. Med. Chem., 2004, 11(16), 2105-2134.
[] [PMID: 15279552]
Schäffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res., 2001, 29(14), 2994-3005.
[] [PMID: 11452024]
Lin, W.Z.; Fang, J.A.; Xiao, X.; Chou, K.C. Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One, 2012, 7(11)e49040
[] [PMID: 23189138]
Beers, E.P.; Moreno, T.N.; Callis, J. Subcellular localization of ubiquitin and ubiquitinated proteins in Arabidopsis thaliana. J. Biol. Chem., 1992, 267(22), 15432-15439.
[PMID: 1322398]
Huang, W.; Sherman, B.T.; Lempicki, R.A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res., 2009, 37(1), 1-13.
[] [PMID: 19033363]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [389 - 399]
Pages: 11
DOI: 10.2174/1389202919666191014091250
Price: $58

Article Metrics

PDF: 14