Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation

Wangren       Qiu; Chunhui       Xu; Xuan       Xiao; Dong       Xu

Abstract

Background: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms.

Objective: To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites.

Methods: In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization.

Results: Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew’s correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew’s correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available.

Conclusion: Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at: https://github.com/Chunhuixu/UBIPredic_QWRCHX.

Keywords: Ubiquitination, machine learning, random forest, protein annotation, subcellular localization, functional domain.

« Previous

Graphical Abstract

[1] 
Aguilar, R.C.; Wendland, B. Ubiquitin: Not just for proteasomes anymore. Curr. Opin. Cell Biol.,  2003, 15(2), 184-190.
[http://dx.doi.org/10.1016/S0955-0674(03)00010-3] [PMID:  12648674] 
[2] 
Welchman, R.L.; Gordon, C.; Mayer, R.J. Ubiquitin and ubiquitin-like proteins as multifunctional signals. Nat. Rev. Mol. Cell Biol.,  2005, 6(8), 599-609.
[http://dx.doi.org/10.1038/nrm1700] [PMID:  16064136] 
[3] 
Haglund, K.; Dikic, I. Ubiquitylation and cell signaling. EMBO J.,  2005, 24(19), 3353-3359.
[http://dx.doi.org/10.1038/sj.emboj.7600808] [PMID:  16148945] 
[4] 
Hoeller, D.; Hecker, C.M.; Dikic, I. Ubiquitin and ubiquitin-like proteins in cancer pathogenesis. Nat. Rev. Cancer,  2006, 6(10), 776-788.
[http://dx.doi.org/10.1038/nrc1994] [PMID:  16990855] 
[5] 
Jadhav, T.; Wooten, M.W. Defining an embedded code for protein ubiquitination. J. Proteomics Bioinform.,  2009, 2, 316.
[http://dx.doi.org/10.4172/jpb.1000091] [PMID:  20148194] 
[6] 
Reinstein, E.; Ciechanover, A. Narrative review: Protein degradation and human diseases: The ubiquitin connection. Ann. Intern. Med.,  2006, 145(9), 676-684.
[http://dx.doi.org/10.7326/0003-4819-145-9-200611070-00010] [PMID:  17088581] 
[7] 
Schwartz, A.L.; Ciechanover, A. The ubiquitin-proteasome pathway and pathogenesis of human diseases. Annu. Rev. Med.,  1999, 50, 57-74.
[http://dx.doi.org/10.1146/annurev.med.50.1.57] [PMID:  10073263] 
[8] 
Iconomou, M.; Saunders, D.N. Systematic approaches to identify E3 ligase substrates. Biochem. J.,  2016, 473(22), 4083-4101.
[http://dx.doi.org/10.1042/BCJ20160719] [PMID:  27834739] 
[9] 
Cai, B.; Jiang, X. Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences. BMC Bioinformatics,  2016, 17, 116.
[http://dx.doi.org/10.1186/s12859-016-0959-z] [PMID:  26940649] 
[10] 
Cai, Y.; Jiang, X. Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences. BMC Bioinformatics,  2016, 17, 116.
[http://dx.doi.org/10.1186/s12859-016-0959-z] [PMID:  26940649] 
[11] 
Chen, Z.; Zhou, Y.; Zhang, Z.; Song, J. Towards more accurate prediction of ubiquitination sites: A comprehensive review of current methods, tools and features. Brief. Bioinform.,  2015, 16(4), 640-657.
[http://dx.doi.org/10.1093/bib/bbu031] [PMID:  25212598] 
[12] 
Radivojac, P.; Vacic, V.; Haynes, C.; Cocklin, R.R.; Mohan, A.; Heyen, J.W.; Goebl, M.G.; Iakoucheva, L.M. Identification, analysis, and prediction of protein ubiquitination sites. Proteins,  2010, 78(2), 365-380.
[http://dx.doi.org/10.1002/prot.22555] [PMID:  19722269] 
[13] 
Cai, Y.; Huang, T.; Hu, L.; Shi, X.; Xie, L.; Li, Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids,  2012, 42(4), 1387-1395.
[http://dx.doi.org/10.1007/s00726-011-0835-0] [PMID:  21267749] 
[14] 
Zhao, X.; Li, X.; Ma, Z.; Yin, M. Prediction of lysine ubiquitylation with ensemble classifier and feature selection. Int. J. Mol. Sci.,  2011, 12(12), 8347-8361.
[http://dx.doi.org/10.3390/ijms12128347] [PMID:  22272076] 
[15] 
Chen, Z.; Chen, Y.Z.; Wang, X.F.; Wang, C.; Yan, R.X.; Zhang, Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One,  2011, 6(7)e22930
[http://dx.doi.org/10.1371/journal.pone.0022930] [PMID:  21829559] 
[16] 
Wang, D.; Liang, Y.; Xu, D. Capsule network for protein post-translational modification site prediction. Bioinformatics,  2019, 35(14), 2386-2394.
[17] 
Chen, X.; Qiu, J.D.; Shi, S.P.; Suo, S.B.; Huang, S.Y.; Liang, R.P. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites. Bioinformatics,  2013, 29(13), 1614-1622.
[http://dx.doi.org/10.1093/bioinformatics/btt196] [PMID:  23626001] 
[18] 
Huang, C.H.; Su, M.G.; Kao, H.J.; Jhong, J.H.; Weng, S.L.; Lee, T.Y. UbiSite: Incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC Syst. Biol.,  2016, 10(Suppl. 1), 6.
[http://dx.doi.org/10.1186/s12918-015-0246-z] [PMID:  26818456] 
[19] 
Li, X.; Gao, X.; Ren, J.; Jin, C.; Xue, Y. BDM-PUB: Computational prediction of protein ubiquitination sites with a Bayesian discriminant method, 2009.
[20] 
Qiu, W.R.; Sun, B.Q.; Xiao, X.; Xu, D.; Chou, K.C. iPhos-PseEvo: Identifying human phosphorylated proteins by incorporating evolutionary information into general PseAAC via grey system theory. Mol. Inform.,  2016, 36(5-6)
[http://dx.doi.org/10.1002/minf.201600010] 
[21] 
Chou, K.C. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics,  2009, 6(4), 262-274.
[http://dx.doi.org/10.2174/157016409789973707] 
[22] 
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins,  2001, 43(3), 246-255.
[http://dx.doi.org/10.1002/prot.1035] [PMID:  11288174] 
[23] 
Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.C. Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res.,  2015, 43(W1), W65-W71.
[http://dx.doi.org/10.1093/nar/gkv458] [PMID:  25958395] 
[24] 
Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; Harris, M.A.; Hill, D.P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J.C.; Richardson, J.E.; Ringwald, M.; Rubin, G.M.; Sherlock, G. Consortium, G.O. The Gene Ontology Consortium. Gene ontology: Tool for the unification of biology. Nat. Genet.,  2000, 25(1), 25-29.
[http://dx.doi.org/10.1038/75556] [PMID:  10802651] 
[25] 
Jones, D.T. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics,  2007, 23(5), 538-544.
[http://dx.doi.org/10.1093/bioinformatics/btl677] [PMID:  17237066] 
[26] 
The UniProt Consortium. UniProt: The universal protein knowledgebase. Nucleic Acids Res.,  2017, 45(D1), D158-D169.
[http://dx.doi.org/10.1093/nar/gkw1099] [PMID:  27899622] 
[27] 
Li, W.; Godzik, A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics,  2006, 22(13), 1658-1659.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID:  16731699] 
[28] 
Nakai, K.; Horton, P. PSORT: A program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci.,  1999, 24(1), 34-36.
[http://dx.doi.org/10.1016/S0968-0004(98)01336-X] [PMID:  10087920] 
[29] 
Harris, M.A.; Clark, J.; Ireland, A.; Lomax, J.; Ashburner, M.; Foulger, R.; Eilbeck, K.; Lewis, S.; Marshall, B.; Mungall, C.; Richter, J.; Rubin, G.M.; Blake, J.A.; Bult, C.; Dolan, M.; Drabkin, H.; Eppig, J.T.; Hill, D.P.; Ni, L.; Ringwald, M.; Balakrishnan, R.; Cherry, J.M.; Christie, K.R.; Costanzo, M.C.; Dwight, S.S.; Engel, S.; Fisk, D.G.; Hirschman, J.E.; Hong, E.L.; Nash, R.S.; Sethuraman, A.; Theesfeld, C.L.; Botstein, D.; Dolinski, K.; Feierbach, B.; Berardini, T.; Mundodi, S.; Rhee, S.Y.; Apweiler, R.; Barrell, D.; Camon, E.; Dimmer, E.; Lee, V.; Chisholm, R.; Gaudet, P.; Kibbe, W.; Kishore, R.; Schwarz, E.M.; Sternberg, P.; Gwinn, M.; Hannick, L.; Wortman, J.; Berriman, M.; Wood, V.; de la Cruz, N.; Tonellato, P.; Jaiswal, P.; Seigfried, T.; White, R.; Gene Ontology, C. The Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res.,  2004, 32(Database issue), D258-D261.
[PMID:  14681407] 
[30] 
Bateman, A.; Birney, E.; Durbin, R.; Eddy, S.R.; Finn, R.D.; Sonnhammer, E.L. Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res.,  1999, 27(1), 260-262.
[http://dx.doi.org/10.1093/nar/27.1.260] [PMID:  9847196] 
[31] 
Letunic, I.; Copley, R.R.; Schmidt, S.; Ciccarelli, F.D.; Doerks, T.; Schultz, J.; Ponting, C.P.; Bork, P. SMART 4.0: Towards genomic data integration. Nucleic Acids Res.,  2004, 32(Database issue), D142-D144.
[http://dx.doi.org/10.1093/nar/gkh088] [PMID:  14681379] 
[32] 
Sigrist, C.J.; Cerutti, L.; de Castro, E.; Langendijk-Genevaux, P.S.; Bulliard, V.; Bairoch, A.; Hulo, N. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res.,  2010, 38(Database issue), D161-D166.
[http://dx.doi.org/10.1093/nar/gkp885] [PMID:  19858104] 
[33] 
Pandit, S.B.; Bhadra, R.; Gowri, V.S.; Balaji, S.; Anand, B.; Srinivasan, N. SUPFAM: A database of sequence superfamilies of protein domains. BMC Bioinformatics,  2004, 5, 28.
[http://dx.doi.org/10.1186/1471-2105-5-28] [PMID:  15113407] 
[34] 
Hunter, S.; Apweiler, R.; Attwood, T.K.; Bairoch, A.; Bateman, A.; Binns, D.; Bork, P.; Das, U.; Daugherty, L.; Duquenne, L.; Finn, R.D.; Gough, J.; Haft, D.; Hulo, N.; Kahn, D.; Kelly, E.; Laugraud, A.; Letunic, I.; Lonsdale, D.; Lopez, R.; Madera, M.; Maslen, J.; McAnulla, C.; McDowall, J.; Mistry, J.; Mitchell, A.; Mulder, N.; Natale, D.; Orengo, C.; Quinn, A.F.; Selengut, J.D.; Sigrist, C.J.; Thimma, M.; Thomas, P.D.; Valentin, F.; Wilson, D.; Wu, C.H.; Yeats, C. InterPro: The integrative protein signature database. Nucleic Acids Res.,  2009, 37(Database issue), D211-D215.
[http://dx.doi.org/10.1093/nar/gkn785] [PMID:  18940856] 
[35] 
Attwood, T.K.; Coletta, A.; Muirhead, G.; Pavlopoulou, A.; Philippou, P.B.; Popov, I.; Romá-Mateo, C.; Theodosiou, A.; Mitchell, A.L. The PRINTS database: A fine-grained protein sequence annotation and analysis resource--its status in 2012. Database (Oxford),  2012, 2012bas019
[http://dx.doi.org/10.1093/database/bas019] [PMID:  22508994] 
[36] 
Mcculloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol.,  1990, 52(1-2), 99-115.
[http://dx.doi.org/10.1007/BF02459570] [PMID:  2185863] 
[37] 
Chou, K.C.; Elrod, D.W. Bioinformatical analysis of G-protein-coupled receptors. J. Proteome Res.,  2002, 1(5), 429-433.
[http://dx.doi.org/10.1021/pr025527k] [PMID:  12645914] 
[38] 
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.,  1995, 20(3), 273-297.
[http://dx.doi.org/10.1007/BF00994018] 
[39] 
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory,  1967, 13(1), 21-27.
[http://dx.doi.org/10.1109/TIT.1967.1053964] 
[40] 
Ho, T.K. The random subspace method for constructing decision forests. IEEE T Pattern Anal.,  1998, 20(8), 832-844.
[http://dx.doi.org/10.1109/34.709601] 
[41] 
Zhang, Z.H.; Wang, Z.H.; Zhang, Z.R.; Wang, Y.X. A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett.,  2006, 580(26), 6169-6174.
[http://dx.doi.org/10.1016/j.febslet.2006.10.017] [PMID:  17069811] 
[42] 
Xiao, X.; Lin, W.Z. Application of protein grey incidence degree measure to predict protein quaternary structural types. Amino Acids,  2009, 37(4), 741-749.
[http://dx.doi.org/10.1007/s00726-008-0212-9] [PMID:  19037711] 
[43] 
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol.,  2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID:  21168420] 
[44] 
Chou, K.C.; Shen, H.B. MemType-2L: A web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem. Biophys. Res. Commun.,  2007, 360(2), 339-345.
[http://dx.doi.org/10.1016/j.bbrc.2007.06.027] [PMID:  17586467] 
[45] 
Chou, K.C.; Shen, H.B. Recent progress in protein subcellular location prediction. Anal. Biochem.,  2007, 370(1), 1-16.
[http://dx.doi.org/10.1016/j.ab.2007.07.006] [PMID:  17698024] 
[46] 
Chou, K.C. Structural bioinformatics and its impact to biomedical science. Curr. Med. Chem.,  2004, 11(16), 2105-2134.
[http://dx.doi.org/10.2174/0929867043364667] [PMID:  15279552] 
[47] 
Schäffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res.,  2001, 29(14), 2994-3005.
[http://dx.doi.org/10.1093/nar/29.14.2994] [PMID:  11452024] 
[48] 
Lin, W.Z.; Fang, J.A.; Xiao, X.; Chou, K.C. Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One,  2012, 7(11)e49040
[http://dx.doi.org/10.1371/journal.pone.0049040] [PMID:  23189138] 
[49] 
Beers, E.P.; Moreno, T.N.; Callis, J. Subcellular localization of ubiquitin and ubiquitinated proteins in Arabidopsis thaliana. J. Biol. Chem.,  1992, 267(22), 15432-15439.
[PMID:  1322398] 
[50] 
Huang, W.; Sherman, B.T.; Lempicki, R.A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res.,  2009, 37(1), 1-13.
[http://dx.doi.org/10.1093/nar/gkn923] [PMID:  19033363] 

Rights & Permissions Print Cite

Article Metrics

29

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1389202919666191014091250	Print ISSN 1389-2029
Publisher Name Bentham Science Publisher	Online ISSN 1875-5488

Current Genomics

Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation

Abstract

Graphical Abstract

Advanced Computational Algorithms and Artificial Intelligence in Clinical Pharmacogenomics

Applications of Single-cell Sequencing Technology in Reproductive Medicine

Big Data in Cancer Research

Current Genomics in Cardiovascular Research

Current Genomics

Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Advanced Computational Algorithms and Artificial Intelligence in Clinical Pharmacogenomics

Applications of Single-cell Sequencing Technology in Reproductive Medicine

Big Data in Cancer Research

Current Genomics in Cardiovascular Research

Related Journals

Related Books

Related Articles