Lysine Malonylation Identification in E. coli with Multiple Features

Yan      Xu; Yingxi      Yang; Hui      Wang; Yuanhai      Shao

Abstract

Motivation: Lysine malonylation in eukaryote proteins had been found in 2011 through high-throughput proteomic analysis. However, it was poorly understood in prokaryotes. Recent researches have shown that maonylation in E. coli was significantly enriched in protein translation, energy metabolism pathways and fatty acid biosynthesis.

Results: In this work we proposed a predictor to identify the lysine malonylation sites in E. coli through physicochemical properties, binary code and sequence frequency by support vector machine algorithm. The experimentally determined lysine malonylation sites were retrieved from the first and largest malonylome dataset in prokaryotes up to date. The physicochemical properties plus position specific amino acid sequence propensity features got the best results with AUC (the area under the Receive Operating Character curve) 0.7994, MCC (Mathew correlation coefficient) 0.4335 in 10-fold cross-validation. Meanwhile the AUC values were 0.7800, 0.7851 and 0.8050 in 6-fold, 8-fold and LOO (leave-one-out) cross-validation, respectively. All the ROC curves were close to each other which illustrated the robustness and performance of the proposed predictor. We also analyzed the sequence propensities through TwoSampleLogo and found some peptides differences with t-test p<0.01. The predictor had shown better results than those of other methods K-Nearest Neighbors, C4.5 decision tree, Naïve Bayes and Random Forest. Functional analysis showed that malonylated proteins were involved in many transcription activities and diverse biological processes. Meanwhile we also developed an online package which could be freely downloaded https://github.com/Sunmile/ Malonylation E.coli.

Keywords: Malonylation, support vector machine, post translational modification, E. coli, Receive Operating Character (ROC), Prokaryotes.

« Previous Next »

Graphical Abstract

[1] 
Liu Z, Wang Y, Gao T, et al. CPLM: a database of protein lysine modifications. Nucleic Acids Res  2014; 42(Database issue): D531-6.
[2] 
Peng C, Lu Z, Xie Z, et al.  The first identification of lysine malonylation substrates and its regulatory enzyme. Mol. Cell Proteomics, 2011, 10(12), M111.012658.
[3] 
Qian L, Nie L, Chen M, et al. Global profiling of protein lysine malonylation in Escherichia coli reveals its role in energy metabolism. J Proteome Res  2016; 15(6): 2060-71.
[4] 
Xu Y, Ding YX, Ding J, Wu LY, Xue Y. Mal-Lys: Prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep  2016; 6: 38318.
[5] 
Xiang Q, Feng K, Liao B, Liu Y, Huang G. Prediction of lysine malonylation sites based on pseudo amino acid compositions. Comb Chem High Throughput Screen  2017; 20(7): 622-8.
[6] 
Wang LN, Shi SP, Xu HD, Wen PP, Qiu JD. Computational prediction of species-specific malonylation sites via enhanced characteristic strategy. Bioinformatics  2017; 33(10): 1457-63.
[7] 
Xu Y, Chou KC. Recent progress in predicting posttranslational modification sites in proteins. Curr Top Med Chem  2016; 16(6): 591-603.
[8] 
Shien DM, Lee TY, Chang WC, et al. Incorporating structural characteristics for identification of protein methylation sites. J Comput Chem  2009; 30(9): 1532-43.
[9] 
Xu HD, Shi SP, Wen PP, Qiu JD. SuccFind: A novel succinylation sites online prediction tool via enhanced characteristic strategy. Bioinformatics  2015; 31(23): 3748-50.
[10] 
Lee TY, Hsu JB, Lin FM, Chang WC, Hsu PC, Huang HD. N-Ace: using solvent accessibility and physicochemical properties to identify protein N-acetylation sites. J Comput Chem  2010; 31(15): 2759-71.
[11] 
Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers  1999; 10(3): 61-74.
[12] 
Zhao Q, Xie Y, Zheng Y, et al.  GPS-SUMO: A tool for the prediction of sumoylation sites and SUMO-interaction motifs. Nucleic Acids Res., 2014, 42(Web Server issue), W325-W330. 
[13] 
Wang XB, Wu LY, Wang YC, Deng NY. Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. Protein Eng Des Sel  2009; 22(11): 707-12.
[14] 
Xu Y, Ding J, Wu LY, Chou KC. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One  2013; 8(2): e55844.
[15] 
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res  2008; 36(Database issue): D202-5.
[16] 
Zhao X, Dai J, Ning Q, Ma Z, Yin M, Sun P. Position-specific analysis and prediction of protein pupylation sites based on multiple features. BioMed Res Int  2013; 2013: 109549.
[17] 
Huang SY, Shi SP, Qiu JD, Liu MC. Using support vector machines to identify protein phosphorylation sites in viruses. J Mol Graph Model  2014; 56C: 84-90.
[18] 
Dou Y, Yao B, Zhang C. PhosphoSVM: Prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids  2014; 46(6): 1459-69.
[19] 
Citak-Er F, Vural M, Acar O, Esen T, Onay A, Ozturk-Isik E. Final gleason score prediction using discriminant analysis and support vector machine based on preoperative multiparametric MR imaging of prostate cancer at 3T. BioMed Res Int  2014; 2014: 690787.
[20] 
Chang WC, Lee TY, Shien DM, et al. Incorporating support vector machine for identifying protein tyrosine sulfation sites. J Comput Chem  2009; 30(15): 2526-37.
[21] 
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. Acm Trans Intellig Sys Tech  2011; 2(3): 1-27.
[22] 
Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics  2008; 7(9): 1598-608.
[23] 
Liu LM, Xu Y, Chou KC. iPGK-PseAAC: Identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC. Med Chem  2017; 13(6): 552-9.
[24] 
Wen PP, Shi SP, Xu HD, Wang LN, Qiu JD. Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics  2016; 32(20): 3107-15.
[25] 
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol  2016; 394: 223-30.
[26] 
Li F, Li C, Wang M, et al. GlycoMine: A machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome. Bioinformatics  2015; 31(9): 1411-9.
[27] 
Gribskov M, Robinson NL. Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput Chem  1996; 20(1): 25-33.
[28] 
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res  2004; 14(6): 1188-90.
[29] 
Vacic V, Iakoucheva LM, Radivojac P. Two sample logo: A graphical representation of the differences between two sets of sequence alignments. Bioinformatics  2006; 22(12): 1536-7.

Rights & Permissions Print Cite

Article Metrics

66

3

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1570164615666181005104614	Print ISSN 1570-1646
Publisher Name Bentham Science Publisher	Online ISSN 1875-6247

Current Proteomics

Lysine Malonylation Identification in E. coli with Multiple Features

Abstract

Graphical Abstract

Mass spectrometry data acquisition and analysis for proteomics

Peptides: State-of-Art and Commercialisation Hurdles

Current Proteomics

Lysine Malonylation Identification in E. coli with Multiple Features

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Mass spectrometry data acquisition and analysis for proteomics

Peptides: State-of-Art and Commercialisation Hurdles

Related Journals

Related Books