Prediction of Protein Ubiquitination Sites in Arabidopsis thaliana

Author(s): Jiajing Chen, Jianan Zhao, Shiping Yang, Zhen Chen*, Ziding Zhang*.

Journal Name: Current Bioinformatics

Volume 14 , Issue 7 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: As one of the most important reversible protein post-translation modification types, ubiquitination plays a significant role in the regulation of many biological processes, such as cell division, signal transduction, apoptosis and immune response. Protein ubiquitination usually occurs when ubiquitin molecule is attached to a lysine on a target protein, which is also known as “lysine ubiquitination”.

Objective: In order to investigate the molecular mechanisms of ubiquitination-related biological processes, the crucial first step is the identification of ubiquitination sites. However, conventional experimental methods in detecting ubiquitination sites are often time-consuming and a large number of ubiquitination sites remain unidentified. In this study, a ubiquitination site prediction method for Arabidopsis thaliana was developed using a Support Vector Machine (SVM).

Methods: We collected 3009 experimentally validated ubiquitination sites on 1607 proteins in A. thaliana to construct the training set. Three feature encoding schemes were used to characterize the sequence patterns around ubiquitination sites, including AAC, Binary and CKSAAP. The maximum Relevance and Minimum Redundancy (mRMR) feature selection method was employed to reduce the dimensionality of input features. Five-fold cross-validation and independent tests were used to evaluate the performance of the established models.

Results: As a result, the combination of AAC and CKSAAP encoding schemes yielded the best performance with the accuracy and AUC of 81.35% and 0.868 in the independent test. We also generated an online predictor termed as AraUbiSite, which is freely accessible at: http://systbio.cau.edu.cn/araubisite.

Conclusion: We developed a well-performed prediction tool for large-scale ubiquitination site identification in A. thaliana. It is hoped that the current work will speed up the process of identification of ubiquitination sites in A. thaliana and help to further elucidate the molecular mechanisms of ubiquitination in plants.

Keywords: Ubiquitination sites, prediction, Arabidopsis thaliana, machine learning, lysine, feature selection.

[1]
Guo H, Li L, Aluru M, Aluru S, Yin Y. Mechanisms and networks for brassinosteroid regulated gene expression. Curr Opin Plant Biol 2013; 16(5): 545-53.
[2]
Banfield MJ. Perturbation of host ubiquitin systems by plant pathogen/pest effector proteins. Cell Microbiol 2015; 17(1): 18-25.
[3]
Furniss JJ, Spoel SH. Cullin-RING ubiquitin ligases in salicylic acid-mediated plant immune signaling. Front Plant Sci 2015; 6: 154.
[4]
Polyn S, Willems A, De Veylder L. Cell cycle entry, maintenance, and exit during plant development. Curr Opin Plant Biol 2015; 23: 1-7.
[5]
Hagai T, Levy Y. Ubiquitin not only serves as a tag but also assists degradation by inducing protein unfolding. Proc Natl Acad Sci USA 2010; 107(5): 2001-6.
[6]
Pickart CM, Eddins MJ. Ubiquitin: structures, functions, mechanisms. Biochim Biophys Acta 2004; 1695(1-3): 55-72.
[7]
Hicke L. Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol 2001; 2(3): 195-201.
[8]
Haglund K, Dikic I. Ubiquitylation and cell signaling. EMBO J 2005; 24(19): 3353-9.
[9]
Heride C, Urbé S, Clague MJ. Ubiquitin code assembly and disassembly. Curr Biol 2014; 24(6): R215-20.
[10]
Hershko A, Ciechanover A. The ubiquitin system. Annu Rev Biochem 1998; 67: 425-79.
[11]
Radivojac P, Vacic V, Haynes C, et al. Identification, analysis, and prediction of protein ubiquitination sites. Proteins 2010; 78(2): 365-80.
[12]
Kim DY, Scalf M, Smith LM, Vierstra RD. Advanced proteomic analyses yield a deep catalog of ubiquitylation targets in Arabidopsis. Plant Cell 2013; 25(5): 1523-40.
[13]
Herrmann J, Lerman LO, Lerman A. Ubiquitin and ubiquitin-like proteins in protein regulation. Circ Res 2007; 100(9): 1276-91.
[14]
Chen Z, Zhou Y, Zhang Z, Song J. Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features. Brief Bioinform 2015; 16(4): 640-57.
[15]
Tomlinson E, Palaniyappan N, Tooth D, Layfield R. Methods for the purification of ubiquitinated proteins. Proteomics 2007; 7(7): 1016-22.
[16]
Peng J, Schwartz D, Elias JE, et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol 2003; 21(8): 921-6.
[17]
Peng J. Evaluation of proteomic strategies for analyzing ubiquitinated proteins. BMB Rep 2008; 41(3): 177-83.
[18]
Walton A, Stes E, Cybulski N, Van Bel M, Inigo S. It's Time for Some "Site"-Seeing: Novel Tools to Monitor the Ubiquitin Landscape in Arabidopsis thaliana 2016; 28(1): 6-16.
[19]
Tung CW, Ho SY. Computational identification of ubiquitylation sites from protein sequences. BMC Bioinformatics 2008; 9: 310.
[20]
Chen Z, Chen YZ, Wang XF, Wang C, Yan RX, Zhang Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One 2011; 6(7)e22930
[21]
Cai Y, Huang T, Hu L, Shi X, Xie L, Li Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 2012; 42(4): 1387-95.
[22]
Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation sites. Bioinformatics 2013; 29(13): 1614-22.
[23]
Chen Z, Zhou Y, Song J, Zhang Z. hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. Biochim Biophys Acta 2013; 1834(8): 1461-7.
[24]
Lee TY, Chen SA, Hung HY, Ou YY. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PLoS One 2011; 6(3)e17331
[25]
Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics (Oxford, England) 2017; 33(5): 661-8.
[26]
Li W, Jaroszewski L, Godzik A. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 2002; 18(1): 77-82.
[27]
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22(13): 1658-9.
[28]
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005; 27(8): 1226-38.
[29]
Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw 1999; 10(5): 988-99.
[30]
Centor RM. Signal detectability: the use of ROC curves and their analyses. Med Decis Making 1991; 11(2): 102-6.
[31]
Gribskov M, Robinson NL. Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 1996; 20(1): 25-33.
[32]
Vacic V, Iakoucheva LM, Radivojac P. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 2006; 22(12): 1536-7.
[33]
Zhou Y, Liu S, Song J, Zhang Z. Structural propensities of human ubiquitination sites: accessibility, centrality and local conformation. PLoS One 2013; 8(12)e83167


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 14
ISSUE: 7
Year: 2019
Page: [614 - 620]
Pages: 7
DOI: 10.2174/1574893614666190311141647
Price: $58

Article Metrics

PDF: 30
HTML: 2
EPUB: 1
PRC: 1