Lysine Malonylation Identification in E. coli with Multiple Features

Author(s): Yan Xu, Yingxi Yang, Hui Wang, Yuanhai Shao*.

Journal Name: Current Proteomics

Volume 16 , Issue 3 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Motivation: Lysine malonylation in eukaryote proteins had been found in 2011 through high-throughput proteomic analysis. However, it was poorly understood in prokaryotes. Recent researches have shown that maonylation in E. coli was significantly enriched in protein translation, energy metabolism pathways and fatty acid biosynthesis.

Results: In this work we proposed a predictor to identify the lysine malonylation sites in E. coli through physicochemical properties, binary code and sequence frequency by support vector machine algorithm. The experimentally determined lysine malonylation sites were retrieved from the first and largest malonylome dataset in prokaryotes up to date. The physicochemical properties plus position specific amino acid sequence propensity features got the best results with AUC (the area under the Receive Operating Character curve) 0.7994, MCC (Mathew correlation coefficient) 0.4335 in 10-fold cross-validation. Meanwhile the AUC values were 0.7800, 0.7851 and 0.8050 in 6-fold, 8-fold and LOO (leave-one-out) cross-validation, respectively. All the ROC curves were close to each other which illustrated the robustness and performance of the proposed predictor. We also analyzed the sequence propensities through TwoSampleLogo and found some peptides differences with t-test p<0.01. The predictor had shown better results than those of other methods K-Nearest Neighbors, C4.5 decision tree, Naïve Bayes and Random Forest. Functional analysis showed that malonylated proteins were involved in many transcription activities and diverse biological processes. Meanwhile we also developed an online package which could be freely downloaded https://github.com/Sunmile/ Malonylation E.coli.

Keywords: Malonylation, support vector machine, post translational modification, E. coli, Receive Operating Character (ROC), Prokaryotes.

[1]
Liu Z, Wang Y, Gao T, et al. CPLM: a database of protein lysine modifications. Nucleic Acids Res 2014; 42(Database issue): D531-6.
[2]
Peng C, Lu Z, Xie Z, et al. The first identification of lysine malonylation substrates and its regulatory enzyme. Mol. Cell Proteomics, 2011, 10(12), M111.012658.
[3]
Qian L, Nie L, Chen M, et al. Global profiling of protein lysine malonylation in Escherichia coli reveals its role in energy metabolism. J Proteome Res 2016; 15(6): 2060-71.
[4]
Xu Y, Ding YX, Ding J, Wu LY, Xue Y. Mal-Lys: Prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci Rep 2016; 6: 38318.
[5]
Xiang Q, Feng K, Liao B, Liu Y, Huang G. Prediction of lysine malonylation sites based on pseudo amino acid compositions. Comb Chem High Throughput Screen 2017; 20(7): 622-8.
[6]
Wang LN, Shi SP, Xu HD, Wen PP, Qiu JD. Computational prediction of species-specific malonylation sites via enhanced characteristic strategy. Bioinformatics 2017; 33(10): 1457-63.
[7]
Xu Y, Chou KC. Recent progress in predicting posttranslational modification sites in proteins. Curr Top Med Chem 2016; 16(6): 591-603.
[8]
Shien DM, Lee TY, Chang WC, et al. Incorporating structural characteristics for identification of protein methylation sites. J Comput Chem 2009; 30(9): 1532-43.
[9]
Xu HD, Shi SP, Wen PP, Qiu JD. SuccFind: A novel succinylation sites online prediction tool via enhanced characteristic strategy. Bioinformatics 2015; 31(23): 3748-50.
[10]
Lee TY, Hsu JB, Lin FM, Chang WC, Hsu PC, Huang HD. N-Ace: using solvent accessibility and physicochemical properties to identify protein N-acetylation sites. J Comput Chem 2010; 31(15): 2759-71.
[11]
Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classifiers 1999; 10(3): 61-74.
[12]
Zhao Q, Xie Y, Zheng Y, et al. GPS-SUMO: A tool for the prediction of sumoylation sites and SUMO-interaction motifs. Nucleic Acids Res., 2014, 42(Web Server issue), W325-W330.
[13]
Wang XB, Wu LY, Wang YC, Deng NY. Prediction of palmitoylation sites using the composition of k-spaced amino acid pairs. Protein Eng Des Sel 2009; 22(11): 707-12.
[14]
Xu Y, Ding J, Wu LY, Chou KC. iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One 2013; 8(2): e55844.
[15]
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 2008; 36(Database issue): D202-5.
[16]
Zhao X, Dai J, Ning Q, Ma Z, Yin M, Sun P. Position-specific analysis and prediction of protein pupylation sites based on multiple features. BioMed Res Int 2013; 2013: 109549.
[17]
Huang SY, Shi SP, Qiu JD, Liu MC. Using support vector machines to identify protein phosphorylation sites in viruses. J Mol Graph Model 2014; 56C: 84-90.
[18]
Dou Y, Yao B, Zhang C. PhosphoSVM: Prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 2014; 46(6): 1459-69.
[19]
Citak-Er F, Vural M, Acar O, Esen T, Onay A, Ozturk-Isik E. Final gleason score prediction using discriminant analysis and support vector machine based on preoperative multiparametric MR imaging of prostate cancer at 3T. BioMed Res Int 2014; 2014: 690787.
[20]
Chang WC, Lee TY, Shien DM, et al. Incorporating support vector machine for identifying protein tyrosine sulfation sites. J Comput Chem 2009; 30(15): 2526-37.
[21]
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. Acm Trans Intellig Sys Tech 2011; 2(3): 1-27.
[22]
Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 2008; 7(9): 1598-608.
[23]
Liu LM, Xu Y, Chou KC. iPGK-PseAAC: Identify lysine phosphoglycerylation sites in proteins by incorporating four different tiers of amino acid pairwise coupling information into the general PseAAC. Med Chem 2017; 13(6): 552-9.
[24]
Wen PP, Shi SP, Xu HD, Wang LN, Qiu JD. Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics 2016; 32(20): 3107-15.
[25]
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 2016; 394: 223-30.
[26]
Li F, Li C, Wang M, et al. GlycoMine: A machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome. Bioinformatics 2015; 31(9): 1411-9.
[27]
Gribskov M, Robinson NL. Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 1996; 20(1): 25-33.
[28]
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator. Genome Res 2004; 14(6): 1188-90.
[29]
Vacic V, Iakoucheva LM, Radivojac P. Two sample logo: A graphical representation of the differences between two sets of sequence alignments. Bioinformatics 2006; 22(12): 1536-7.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 16
ISSUE: 3
Year: 2019
Page: [166 - 174]
Pages: 9
DOI: 10.2174/1570164615666181005104614
Price: $58

Article Metrics

PDF: 27
HTML: 1