Generic placeholder image

Combinatorial Chemistry & High Throughput Screening


ISSN (Print): 1386-2073
ISSN (Online): 1875-5402

Research Article

Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm

Author(s): ShaoPeng Wang, JiaRui Li, Xijun Sun, Yu-Hang Zhang, Tao Huang* and Yudong Cai*

Volume 23 , Issue 4 , 2020

Page: [304 - 312] Pages: 9

DOI: 10.2174/1386207322666181227144318

Price: $65


Background: As a newly uncovered post-translational modification on the ε-amino group of lysine residue, protein malonylation was found to be involved in metabolic pathways and certain diseases. Apart from experimental approaches, several computational methods based on machine learning algorithms were recently proposed to predict malonylation sites. However, previous methods failed to address imbalanced data sizes between positive and negative samples.

Objective: In this study, we identified the significant features of malonylation sites in a novel computational method which applied machine learning algorithms and balanced data sizes by applying synthetic minority over-sampling technique.

Method: Four types of features, namely, amino acid (AA) composition, position-specific scoring matrix (PSSM), AA factor, and disorder were used to encode residues in protein segments. Then, a two-step feature selection procedure including maximum relevance minimum redundancy and incremental feature selection, together with random forest algorithm, was performed on the constructed hybrid feature vector.

Results: An optimal classifier was built from the optimal feature subset, which featured an F1-measure of 0.356. Feature analysis was performed on several selected important features.

Conclusion: Results showed that certain types of PSSM and disorder features may be closely associated with malonylation of lysine residues. Our study contributes to the development of computational approaches for predicting malonyllysine and provides insights into molecular mechanism of malonylation.

Keywords: Post-translational modification, malonylation site, synthetic minority over-sampling technique, maximum relevance minimum redundancy, random forest.

Witze, E.S.; Old, W.M.; Resing, K.A.; Ahn, N.G. Mapping protein post-translational modifications with mass spectrometry. Nat. Methods, 2007, 4(10), 798-806.
[] [PMID: 17901869]
Walsh, C.T.; Garneau-Tsodikova, S.; Gatto, G.J., Jr Protein posttranslational modifications: the chemistry of proteome diversifications. Angewandte Chemie., 2005, 44(45), 7342-7372.
[] [PMID: 16267872]
Barkia, A. [What do we know about lipoproteins containing apo A-I?]. Ann. Biol. Clin. (Paris), 1990, 48(8), 529-535.
[PMID: 2288439]
Lu, C.T.; Huang, K.Y.; Su, M.G.; Lee, T.Y.; Bretaña, N.A.; Chang, W.C.; Chen, Y.J.; Huang, H.D. DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res., 2013, 41(Database issue), D295-D305.
[] [PMID: 23193290]
Khoury, G.A.; Baliban, R.C.; Floudas, C.A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci. Rep., 2011, 1, 1-5.
[] [PMID: 22034591]
Martin, C.; Zhang, Y. The diverse functions of histone lysine methylation. Nat. Rev. Mol. Cell Biol., 2005, 6(11), 838-849.
[] [PMID: 16261189]
Hershko, A.; Ciechanover, A.; Varshavsky, A. Basic Medical Research Award. The ubiquitin system. Nat. Med., 2000, 6(10), 1073-1081.
[] [PMID: 11017125]
Peng, C.; Lu, Z.; Xie, Z.; Cheng, Z.; Chen, Y.; Tan, M.; Luo, H.; Zhang, Y.; He, W.; Yang, K.; Zwaans, B. M.; Tishkoff, D.; Ho, L.; Lombard, D.; He, T.C.; Dai, J.; Verdin, E.; Ye, Y.; Zhao, Y. The first identification of lysine malonylation substrates and its regulatory enzyme. Mol. Cell. Proteomics, 2011, 10(12), M111 012658
[] [PMID: 21908771]
Tan, M.; Luo, H.; Lee, S.; Jin, F.; Yang, J.S.; Montellier, E.; Buchou, T.; Cheng, Z.; Rousseaux, S.; Rajagopal, N.; Lu, Z.; Ye, Z.; Zhu, Q.; Wysocka, J.; Ye, Y.; Khochbin, S.; Ren, B.; Zhao, Y. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell, 2011, 146(6), 1016-1028.
[] [PMID: 21925322]
Dai, L.; Peng, C.; Montellier, E.; Lu, Z.; Chen, Y.; Ishii, H.; Debernardi, A.; Buchou, T.; Rousseaux, S.; Jin, F.; Sabari, B.R.; Deng, Z.; Allis, C.D.; Ren, B.; Khochbin, S.; Zhao, Y. Lysine 2-hydroxyisobutyrylation is a widely distributed active histone mark. Nat. Chem. Biol., 2014, 10(5), 365-370.
[] [PMID: 24681537]
Tan, M.; Peng, C.; Anderson, K.A.; Chhoy, P.; Xie, Z.; Dai, L.; Park, J.; Chen, Y.; Huang, H.; Zhang, Y.; Ro, J.; Wagner, G.R.; Green, M.F.; Madsen, A.S.; Schmiesing, J.; Peterson, B.S.; Xu, G.; Ilkayeva, O.R.; Muehlbauer, M.J.; Braulke, T.; Mühlhausen, C.; Backos, D.S.; Olsen, C.A.; McGuire, P.J.; Pletcher, S.D.; Lombard, D.B.; Hirschey, M.D.; Zhao, Y. Lysine glutarylation is a protein posttranslational modification regulated by SIRT5. Cell Metab., 2014, 19(4), 605-617.
[] [PMID: 24703693]
Hirschey, M.D.; Zhao, Y. Metabolic regulation by lysine malonylation, succinylation, and glutarylation. Mol. Cell. Proteomics, 2015, 14(9), 2308-2315.
Zhang, Z.; Tan, M.; Xie, Z.; Dai, L.; Chen, Y.; Zhao, Y. Identification of lysine succinylation as a new post-translational modification. Nat. Chem. Biol., 2011, 7(1), 58-63.
[] [PMID: 21151122]
Amamoto, Y.; Aoi, Y.; Nagashima, N.; Suto, H.; Yoshidome, D.; Arimura, Y.; Osakabe, A.; Kato, D.; Kurumizaka, H.; Kawashima, S.A.; Yamatsugu, K.; Kanai, M. Synthetic posttranslational modifications: chemical catalyst-driven regioselective histone acylation of native chromatin. J. Am. Chem. Soc., 2017, 139(22), 7568-7576.
[] [PMID: 28534629]
Ma, Y.; Yang, M.; Lin, X.; Liu, X.; Huang, H.; Ge, F. Malonylome analysis reveals the involvement of lysine malonylation in metabolism and photosynthesis in cyanobacteria. J. Proteome Res., 2017, 16(5), 2030-2043.
[] [PMID: 28365990]
Colak, G.; Pougovkina, O.; Dai, L.; Tan, M.; Te Brinke, H.; Huang, H.; Cheng, Z.; Park, J.; Wan, X.; Liu, X.; Yue, W.W.; Wanders, R.J.; Locasale, J.W.; Lombard, D.B.; de Boer, V.C.; Zhao, Y. Proteomic and biochemical studies of lysine malonylation suggest its malonic aciduria-associated regulatory role in mitochondrial function and fatty acid oxidation. Mol. Cell. Proteomics, 2015, 14(11), 3056-3071.
[] [PMID: 26320211]
Du, Y.; Cai, T.; Li, T.; Xue, P.; Zhou, B.; He, X.; Wei, P.; Liu, P.; Yang, F.; Wei, T. Lysine malonylation is elevated in type 2 diabetic mouse models and enriched in metabolic associated proteins. Mol. Cell. Proteomics, 2015, 14(1), 227-236.
[] [PMID: 25418362]
Nie, L.; Shuai, L.; Zhu, M.; Liu, P.; Xie, Z-F.; Jiang, S.; Jiang, H-W.; Li, J.; Zhao, Y.; Li, J-Y.; Tan, M. The landscape of histone modifications in a high-fat diet-induced obese (DIO) mouse model. Mol. Cell. Proteomics, 2017, 16(7), 1324-1334.
[] [PMID: 28450421]
Wagner, G.R.; Hirschey, M.D.A. Prob(e)able route to lysine acylation. Cell Chem. Biol., 2017, 24(2), 126-128.
[] [PMID: 28212757]
Weinert, B.T.; Iesmantavicius, V.; Wagner, S.A.; Schölz, C.; Gummesson, B.; Beli, P.; Nyström, T.; Choudhary, C. Acetyl-phosphate is a critical determinant of lysine acetylation in E. coli. Mol. Cell, 2013, 51(2), 265-272.
[] [PMID: 23830618]
Colak, G.; Xie, Z.; Zhu, A.Y.; Dai, L.; Lu, Z.; Zhang, Y.; Wan, X.; Chen, Y.; Cha, Y.H.; Lin, H.; Zhao, Y.; Tan, M. Identification of lysine succinylation substrates and the succinylation regulatory enzyme CobB in Escherichia coli. Mol. Cell. Proteomics, 2013, 12(12), 3509-3520.
[] [PMID: 24176774]
Choudhary, C.; Kumar, C.; Gnad, F.; Nielsen, M.L.; Rehman, M.; Walther, T.C.; Olsen, J.V.; Mann, M. Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science, 2009, 325(5942), 834-840.
[] [PMID: 19608861]
Xu, Y.; Ding, Y.X.; Ding, J.; Wu, L.Y.; Xue, Y. Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. Sci. Rep., 2016, 6, 38318.
[] [PMID: 27910954]
Xiang, Q.; Feng, K.; Liao, B.; Liu, Y.; Huang, G. Prediction of lysine malonylation sites based on pseudo amino acid compositions. Comb. Chem. High Throughput Screen., 2017, 20, 622-628.
[] [PMID: 28292251]
Wang, L-N.; Shi, S-P.; Xu, H-D.; Wen, P-P.; Qiu, J-D. Computational prediction of species-specific malonylation sites via enhanced characteristic strategy. Bioinformatics, 2017, 33(10), 1457-1463.
[] [PMID: 28025199]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, 16, 321-357.
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
Breiman, L. Random forests. Mach. Learn., 2001, 45(1), 5-32.
The UniProt Consortium. The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res., 2009, 37, D169-D174.
[] [PMID: 18836194]
Zhang, Q.; Sun, X.; Feng, K.; Wang, S.; Zhang, Y.H.; Wang, S.; Lu, L.; Cai, Y.D. Predicting citrullination sites in protein sequences using mRMR method and random forest algorithm. Comb. Chem. High Throughput Screen., 2017, 20(2), 164-173.
[] [PMID: 28029071]
Wang, S.; Zhang, Q.; Lu, J.; Cai, Y-D. Analysis and prediction of nitrated tyrosine sites with the mRMR method and support vector machine algorithm. Curr. Bioinform., 2018, 13(1), 3-13.
Chen, L.; Wang, S.; Zhang, Y-H.; Wei, L.; Xu, X.; Huang, T.; Cai, Y-D. Prediction of nitrated tyrosine residues in protein sequences by extreme learning machine and feature selection methods. Comb. Chem. High Throughput Screen., 2018, 21(6), 393-402.
[] [PMID: 29848272]
Zhang, J.; Zhao, X.; Sun, P.; Ma, Z. PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of Chou’s PseAAC. Int. J. Mol. Sci., 2014, 15(7), 11204-11219.
[] [PMID: 24968264]
Xu, Y.; Shao, X-J.; Wu, L-Y.; Deng, N-Y.; Chou, K-C. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. Peer.J., 2013, 1, e171
[] [PMID: 24109555]
Cai, Y.; Huang, T.; Hu, L.; Shi, X.; Xie, L.; Li, Y. Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids, 2012, 42(4), 1387-1395.
[] [PMID: 21267749]
Hu, L.L.; Wan, S.B.; Niu, S.; Shi, X.H.; Li, H.P.; Cai, Y.D.; Chou, K.C. Prediction and analysis of protein palmitoylation sites. Biochimie, 2011, 93(3), 489-496.
[] [PMID: 21075167]
Hu, L.L.; Li, Z.; Wang, K.; Niu, S.; Shi, X.H.; Cai, Y.D.; Li, H.P. Prediction and analysis of protein methylarginine and methyllysine based on Multisequence features. Biopolymers, 2011, 95(11), 763-771.
[] [PMID: 21544797]
Zhou, Y.; Zhang, N.; Li, B.Q.; Huang, T.; Cai, Y.D.; Kong, X.Y. A method to distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis. J. Biomol. Struct. Dyn., 2015, 33(11), 2479-2490.
[] [PMID: 25616595]
Niu, S.; Hu, L.L.; Zheng, L.L.; Huang, T.; Feng, K.Y.; Cai, Y.D.; Li, H.P.; Li, Y.X.; Chou, K.C. Predicting protein oxidation sites with feature selection and analysis approach. J. Biomol. Struct. Dyn., 2012, 29(6), 650-658.
[] [PMID: 22545996]
Cai, Y.; He, J.; Lu, L. Predicting sumoylation site by feature selection method. J. Biomol. Struct. Dyn., 2011, 28(5), 797-804.
[] [PMID: 21294590]
Niu, S.; Huang, T.; Feng, K.; Cai, Y.; Li, Y. Prediction of tyrosine sulfation with mRMR feature selection and analysis. J. Proteome Res., 2010, 9(12), 6490-6497.
[] [PMID: 20973568]
Xu, X.; Yu, D.; Fang, W.; Cheng, Y.; Qian, Z.; Lu, W.; Cai, Y.; Feng, K. Prediction of peptidase category based on functional domain composition. J. Proteome Res., 2008, 7(10), 4521-4524.
[] [PMID: 18763822]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
[] [PMID: 9254694]
Kawashima, S.; Kanehisa, M. AAindex: amino acid index database. Nucleic Acids Res., 2000, 28(1), 374-374.
[] [PMID: 10592278]
Atchley, W.R.; Zhao, J.; Fernandes, A.D.; Drüke, T. Solving the protein sequence metric problem. Proc. Natl. Acad. Sci. USA, 2005, 102(18), 6395-6400.
[] [PMID: 15851683]
Ferron, F.; Longhi, S.; Canard, B.; Karlin, D. A practical overview of protein disorder prediction methods. Proteins Struct. Funct. Bioinf., 2006, 65(1), 1-14.
[] [PMID: 16856179]
Noivirt-Brik, O.; Prilusky, J.; Sussman, J.L. Assessment of disorder predictions in CASP8. Proteins Struct. Funct. Bioinf., 2009, 77, 210-216.
[] [PMID: 19774619]
Peng, K.; Radivojac, P.; Vucetic, S.; Dunker, A.K.; Obradovic, Z. Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics, 2006, 7(1), 208.
[] [PMID: 16618368]
Witten, I.H.; Frank, E. Data Mining:Practical Machine Learning Tools and Techniques; Morgan Kaufmann Publishers: San Francisco, 2005.
Huang, T.; Chen, L.; Cai, Y.D.; Chou, K.C. Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One, 2011, 6(9), e25297
[] [PMID: 21980418]
Liu, L.; Chen, L.; Zhang, Y.H.; Wei, L.; Cheng, S.; Kong, X.; Zheng, M.; Huang, T.; Cai, Y.D. Analysis and prediction of drug-drug interaction by minimum redundancy maximum relevance and incremental feature selection. J. Biomol. Struct. Dyn., 2017, 35(2), 312-329.
[] [PMID: 26750516]
Chen, L.; Zhang, Y.H.; Lu, G.; Huang, T.; Cai, Y.D. Analysis of cancer-related lncRNAs using gene ontology and KEGG pathways. Artif. Intell. Med., 2017, 76, 27-36.
[] [PMID: 28363286]
Radovic, M.; Ghalwash, M.; Filipovic, N.; Obradovic, Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics, 2017, 18(1), 9.
[] [PMID: 28049413]
Chen, L.; Pan, X.; Hu, X.; Zhang, Y-H.; Wang, S.; Huang, T.; Cai, Y-D. Gene expression differences among different MSI statuses in colorectal cancer. Int. J. Cancer, 2018, 143(7), 1731-1740.
[] [PMID: 29696646]
Chen, L.; Zhang, Y-H.; Huang, G.; Pan, X.; Wang, S.; Huang, T.; Cai, Y-D. Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection. Mol. Genet. Genomics, 2018, 293(1), 137-149.
[] [PMID: 28913654]
Li, J.; Lu, L.; Zhang, Y.H.; Liu, M.; Chen, L.; Huang, T.; Cai, Y-D. Identification of synthetic lethality based on a functional network by using machine learning algorithms. J. Cell. Biochem., 2019, 120(1), 405-416.
[] [PMID: 30125975]
Korkmaz, S.A.; Korkmaz, M.F.; Poyraz, M. Diagnosis of breast cancer in light microscopic and mammographic images textures using relative entropy via kernel estimation. Med. Biol. Eng. Comput., 2016, 54(4), 561-573.
[] [PMID: 26345243]
Ho, T.K. Random Decision Forests. In: Proceeding of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC; 14-16 August 1995Montreal, QC. 1995, pp. 278-282.
Chou, K.; Shen, H. Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc., 2008, 3(2), 153-162.
[] [PMID: 18274516]
Matthews, B. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et. Biophysica Acta (BBA)-Protein Struct., 1975, 405(2), 442-451.
[] [PMID: 1180967]
Chen, L.; Chu, C.; Zhang, Y-H.; Zheng, M-Y.; Zhu, L.; Kong, X.; Huang, T. Identification of drug-drug interactions using chemical interactions. Curr. Bioinform., 2017, 12(6), 526-534.
[] [PMID: 25975815]
Chen, L.; Wang, S.; Zhang, Y-H.; Li, J.; Xing, Z-H.; Yang, J.; Huang, T.; Cai, Y-D. Identify key sequence features to improve CRISPR sgRNA efficacy. IEEE Access, 2017, 5, 26582-26590.
Ting, K.M.; Witten, I.H. Stacking bagged and dagged models. In: Fourteenth international Conference on Machine Learning, San Francisco, CA. 1997, pp. 367-375.
Nishida, Y.; Rardin, Matthew J.; Carrico, C.; He, W.; Sahu, Alexandria K.; Gut, P.; Najjar, R.; Fitch, M.; Hellerstein, M.; Gibson, Bradford W.; Verdin, E. SIRT5 regulates both cytosolic and mitochondrial protein malonylation with glycolysis as a major target. Mol. Cell, 2015, 59(2), 321-332.
[] [PMID: 26073543]
Qian, L.; Nie, L.; Chen, M.; Liu, P.; Zhu, J.; Zhai, L.; Tao, S-C.; Cheng, Z.; Zhao, Y.; Tan, M. Global profiling of protein lysine malonylation in Escherichia coli reveals its role in energy metabolism. J. Proteome Res., 2016, 15(6), 2060-2071.
[] [PMID: 27183143]
Crooks, G.E.; Hon, G.; Chandonia, J.M.; Brenner, S.E. WebLogo: a sequence logo generator. Genome Res., 2004, 14(6), 1188-1190.
[] [PMID: 15173120]

Rights & Permissions Print Export Cite as
© 2022 Bentham Science Publishers | Privacy Policy