An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier

Author(s): Samme Amena Tasmia, Fee Faysal Ahmed, Parvez Mosharaf, Mehedi Hasan, Nurul Haque Mollah*

Journal Name: Current Genomics

Volume 22 , Issue 2 , 2021


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Background: Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development.

Methods: In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of kspaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources.

Results: The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models.

Conclusion: The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.

Keywords: Protein sequences, lysine succinylation site, prediction, encoding schemes, feature selection, random forest, fusion model.

[1]
Weinert, B.T.; Schölz, C.; Wagner, S.A.; Iesmantavicius, V.; Su, D.; Daniel, J.A.; Choudhary, C. Lysine succinylation is a frequently occurring modification in prokaryotes and eukaryotes and extensively overlaps with acetylation. Cell Rep., 2013, 4(4), 842-851.
[http://dx.doi.org/10.1016/j.celrep.2013.07.024] [PMID: 23954790]
[2]
Xie, Z.; Dai, J.; Dai, L.; Tan, M.; Cheng, Z.; Wu, Y.; Boeke, J.D.; Zhao, Y. Lysine succinylation and lysine malonylation in histones. Mol. Cell. Proteomics, 2012, 11(5), 100-107.
[http://dx.doi.org/10.1074/mcp.M111.015875] [PMID: 22389435]
[3]
Tan, M.; Peng, C.; Anderson, K.A.; Chhoy, P.; Xie, Z.; Dai, L.; Park, J.; Chen, Y.; Huang, H.; Zhang, Y.; Ro, J.; Wagner, G.R.; Green, M.F.; Madsen, A.S.; Schmiesing, J.; Peterson, B.S.; Xu, G.; Ilkayeva, O.R.; Muehlbauer, M.J.; Braulke, T.; Mühlhausen, C.; Backos, D.S.; Olsen, C.A.; McGuire, P.J.; Pletcher, S.D.; Lombard, D.B.; Hirschey, M.D.; Zhao, Y. Lysine glutarylation is a protein posttranslational modification regulated by SIRT5. Cell Metab., 2014, 19(4), 605-617.
[http://dx.doi.org/10.1016/j.cmet.2014.03.014] [PMID: 24703693]
[4]
Zhang, Z.; Tan, M.; Xie, Z.; Dai, L.; Chen, Y.; Zhao, Y. Identification of lysine succinylation as a new post-translational modification. Nat. Chem. Biol., 2011, 7(1), 58-63.
[http://dx.doi.org/10.1038/nchembio.495] [PMID: 21151122]
[5]
Rosen, R.; Becher, D.; Büttner, K.; Biran, D.; Hecker, M.; Ron, E.Z. Probing the active site of homoserine trans-succinylase. FEBS Lett., 2004, 577(3), 386-392.
[http://dx.doi.org/10.1016/j.febslet.2004.10.037] [PMID: 15556615]
[6]
Machida, Y.; Chiba, T.; Takayanagi, A.; Tanaka, Y.; Asanuma, M.; Ogawa, N.; Koyama, A.; Iwatsubo, T.; Ito, S.; Jansen, P.H.; Shimizu, N.; Tanaka, K.; Mizuno, Y.; Hattori, N. Common anti-apoptotic roles of parkin and α-synuclein in human dopaminergic cells. [J] Biochem. Biophys. Res. Commun., 2005, 332(1), 233-240.
[http://dx.doi.org/10.1016/j.bbrc.2005.04.124] [PMID: 15896322]
[7]
Lind, C.; Gerdes, R.; Hamnell, Y.; Schuppe-Koistinen, I.; von Löwenhielm, H.B.; Holmgren, A.; Cotgreave, I.A. Identification of S-glutathionylated cellular proteins during oxidative stress and constitutive metabolism by affinity purification and proteomic analysis. [J] Arch. Biochem. Biophys., 2002, 406(2), 229-240.
[http://dx.doi.org/10.1016/S0003-9861(02)00468-X] [PMID: 12361711]
[8]
Park, J.; Chen, Y.; Tishkoff, D.X.; Peng, C.; Tan, M.; Dai, L.; Xie, Z.; Zhang, Y.; Zwaans, B.M.; Skinner, M.E.; Lombard, D.B.; Zhao, Y. SIRT5-mediated lysine desuccinylation impacts diverse metabolic pathways. Mol. Cell, 2013, 50(6), 919-930.
[http://dx.doi.org/10.1016/j.molcel.2013.06.001] [PMID: 23806337]
[9]
Colak, G.; Xie, Z.; Zhu, A.Y.; Dai, L.; Lu, Z.; Zhang, Y.; Wan, X.; Chen, Y.; Cha, Y.H.; Lin, H.; Zhao, Y.; Tan, M. Identification of lysine succinylation substrates and the succinylation regulatory enzyme CobB in Escherichia coli. Mol. Cell. Proteomics, 2013, 12(12), 3509-3520.
[http://dx.doi.org/10.1074/mcp.M113.031567] [PMID: 24176774]
[10]
Li, X.; Hu, X.; Wan, Y.; Xie, G.; Li, X.; Chen, D.; Cheng, Z.; Yi, X.; Liang, S.; Tan, F. Systematic identification of the lysine succinylation in the protozoan parasite Toxoplasma gondii. J. Proteome Res., 2014, 13(12), 6087-6095.
[http://dx.doi.org/10.1021/pr500992r] [PMID: 25377623]
[11]
Yang, M.; Wang, Y.; Chen, Y.; Cheng, Z.; Gu, J.; Deng, J.; Bi, L.; Chen, C.; Mo, R.; Wang, X.; Ge, F. Succinylome analysis reveals the involvement of lysine succinylation in metabolism in pathogenic Mycobacterium tuberculosis. Mol. Cell. Proteomics, 2015, 14(4), 796-811.
[http://dx.doi.org/10.1074/mcp.M114.045922] [PMID: 25605462]
[12]
Jin, W.; Wu, F. Proteome-wide identification of lysine succinylation in the proteins of tomato (Solanum lycopersicum). PLoS One, 2016, 11(2)e0147586
[http://dx.doi.org/10.1371/journal.pone.0147586] [PMID: 26828863]
[13]
Xie, L.; Li, J.; Deng, W.; Yu, Z.; Fang, W.; Chen, M.; Liao, W.; Xie, J.; Pan, W. Proteomic analysis of lysine succinylation of the human pathogen Histoplasma capsulatum. J. Proteomics, 2017, 154, 109-117.
[http://dx.doi.org/10.1016/j.jprot.2016.12.020] [PMID: 28063982]
[14]
Hasan, M.M.; Khatun, M.S.; Mollah, M.N.H.; Yong, C.; Guo, D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int. J. Nanomedicine, 2017, 12, 6303-6315.
[http://dx.doi.org/10.2147/IJN.S140875] [PMID: 28894368]
[15]
Hasan, M.M.; Yang, S.; Zhou, Y.; Mollah, M.N. SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties. Mol. Biosyst., 2016, 12(3), 786-795.
[http://dx.doi.org/10.1039/C5MB00853K] [PMID: 26739209]
[16]
Huang, K.Y.; Hsu, J.B.; Lee, T.Y. Characterization and identification of lysine succinylation sites based on deep learning method. Sci. Rep., 2019, 9(1), 16175.
[http://dx.doi.org/10.1038/s41598-019-52552-4] [PMID: 31700141]
[17]
Ning, W.; Xu, H.; Jiang, P.; Cheng, H.; Deng, W.; Guo, Y.; Xue, Y. HybridSucc: A hybrid-learning architecture for general and species-specific succinylation site prediction. Genomics Proteomics Bioinformatics, 2020, 18(2), 194-207.
[http://dx.doi.org/10.1016/j.gpb.2019.11.010] [PMID: 32861878]
[18]
Hasan, M.M.; Kurata, H. GPSuc: Global prediction of generic and species-specific succinylation sites by aggregating multiple sequence features. PLoS One, 2018, 13(10)e0200283
[http://dx.doi.org/10.1371/journal.pone.0200283] [PMID: 30312302]
[19]
Shoombuatong, W.; Hongjaisee, S.; Barin, F.; Chaijaruwanich, J.; Samleerat, T. HIV-1 CRF01_AE coreceptor usage prediction using kernel methods based logistic model trees. Comput. Biol. Med., 2012, 42(9), 885-889.
[http://dx.doi.org/10.1016/j.compbiomed.2012.06.011] [PMID: 22824642]
[20]
Rashid, M.M.; Shatabda, S.; Hasan, M.M.; Kurata, H. Recent development of machine learning methods in microbial phosphorylation sites. Curr. Genomics, 2020, 21(3), 194-203.
[http://dx.doi.org/10.2174/1389202921666200427210833] [PMID: 33071613]
[21]
Manavalan, B.; Govindaraj, R.G.; Shin, T.H.; Kim, M.O.; Lee, G. iBCE-EL: A new ensemble learning framework for improved linear B-cell epitope prediction. Front. Immunol., 2018, 9, 1695.
[http://dx.doi.org/10.3389/fimmu.2018.01695] [PMID: 30100904]
[22]
Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010, 26(5), 680-682.
[http://dx.doi.org/10.1093/bioinformatics/btq003] [PMID: 20053844]
[23]
Eva, O.; Oskar, O.; Jozef, K. Methodology and Application of the Kruskal-Wallis Test. Appl. Mech. Mater., 2014, 611 Available at: www.scientific.net/AMM.611.11
[24]
Rahman, M.M.; Mollah, M.N.H. Robustification of gaussian bayes Classifier by the minimum β-divergence method. J. Classif., 2019, 36, 113-139.
[http://dx.doi.org/10.1007/s00357-019-9306-1]
[25]
Boosting Algorithms. AdaBoost, Gradient Boosting and XGBoost, 2018. Available at: hackernoon.com, May 5, 2018. Retrieved 2020- 01-04.
[26]
Cortes, C.; Vapnik, V.N. Support-vector networks. Mach. Learn., 1995, 20(3), 273-297.
[http://dx.doi.org/10.1007/BF00994018]
[27]
Breiman, L. Random forests. Mach. Learn., 2001, 45, 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[28]
Chen, Z.; Chen, Y-Z.; Wang, X-F.; Wang, C.; Yan, R-X.; Zhang, Z. Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs. PLoS One, 2011, 6(7)e22930
[http://dx.doi.org/10.1371/journal.pone.0022930] [PMID: 21829559]
[29]
Hasan, M.M.; Zhou, Y.; Lu, X.; Li, Z.; Song, J.; Zhang, Z. Computational Identification of protein pupylation sites by using profile-based composition of k-spaced amino acid pairs. PLoS One, 2015.e0129635
[http://dx.doi.org/10.1371/journal.pone.0129635]
[30]
Hasan, M.M.; Schaduangrat, N.; Lee, G.; Shoombuatong, W.; Manavalan, B. HLPpred-Fuse: Improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics, 2020, 36(11), 3350-3356.
[http://dx.doi.org/10.1093/bioinformatics/btaa160]
[31]
Charoenkwan, P.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J. Comput. Aided Mol. Des., 2020.
[http://dx.doi.org/10.1007/s10822-020-00323]
[32]
Khatun, M.S.; Hasan, M.M.; Kurata, H. PreAIP: computational prediction of anti-inflammatory peptides by integrating multiple complementary features. Front. Genet., 2019, 10, 129.
[http://dx.doi.org/10.3389/fgene.2019.00129] [PMID: 30891059]
[33]
Islam, M.M.; Alam, M.J.; Ahmed, F.F.; Hasan, M.M.; Mollah, M.N.H. Improved prediction of protein-protein interaction mapping on homo sapiens by using amino acid sequence features in a supervised learning framework. Protein Pept. Lett., 2020, 28(1), 74-83.
[http://dx.doi.org/10.2174/0929866527666200610141258] [PMID: 32520672]
[34]
Saidijam, M.; Azizpour, S.; Patching, S.G. Amino acid composition analysis of human secondary transport proteins and implications for reliable membrane topology prediction. J. Biomol. Struct. Dyn., 2017, 35(5), 929-949.
[http://dx.doi.org/10.1080/07391102.2016.1167622] [PMID: 27159787]
[35]
Sahu, S.S.; Panda, G. A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput. Biol. Chem., 2010, 34(5-6), 320-327.
[http://dx.doi.org/10.1016/j.compbiolchem.2010.09.002] [PMID: 21106461]
[36]
Breiman, L. SNP-based analysis of genetic substructure in the German population. Mach. Learn., 2001, 45, 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[37]
Mosharaf, M.P.; Hassan, M.M.; Ahmed, F.F.; Shamima, K.M.; Moni, M. Mollah, M. N. H. Computational Prediction of Protein Ubiquitination Sites Mapping on Arabidopsis Thaliana. Comput. Biol. Chem., 2020, 85107238
[http://dx.doi.org/10.1016/j.compbiolchem.2020.107238]
[38]
Charoenkwan, P.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iTTCA-Hybrid: Improved and robust identification of tumor T cell antigens by utilizing hybrid feature representation. Anal. Biochem., 2020, 599113747
[http://dx.doi.org/10.1016/j.ab.2020.113747] [PMID: 32333902]
[39]
Hasan, M.M.; Manavalan, B.; Shoombuatong, W.; Khatun, M.S.; Kurata, H. i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes. Comput. Struct. Biotechnol. J., 2020, 18, 906-912.
[http://dx.doi.org/10.1016/j.csbj.2020.04.001] [PMID: 32322372]
[40]
Charoenkwan, P.; Yana, J.; Schaduangrat, N.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics, 2020, 112(4), 2813-2822.
[http://dx.doi.org/10.1016/j.ygeno.2020.03.019] [PMID: 32234434]
[41]
Hasan, M.M.; Khatun, M.S.; Kurata, H. iLBE for computational identification of linear B-cell epitopes by integrating sequence and evolutionary features. Genomics Proteomics Bioinformatics, 2020, S1672-0229(18), 30274-2.
[42]
Khatun, M.S.; Hasan, M.M.; Shoombuatong, W.; Kurata, H. ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J. Comput. Aided Mol. Des., 2020, 34(12), 1229-1236.
[http://dx.doi.org/10.1007/s10822-020-00343-9] [PMID: 32964284]
[43]
Basith Mail, S.; Manavalan, B.; Shin, T.H.; Lee, D.; Lee, G. Evolution of machine learning algorithms in the prediction and design of anticancer peptides. Curr. Protein Pept. Sci., 2020, 21(12), 1242-1250.
[http://dx.doi.org/10.2174/1389203721666200117171403] [PMID: 31957610]
[44]
Andy, L.; Matthew, W. Classification and regression based on a forest of trees using random inputs; R Package, 2018.
[45]
Chatterjee, S. Implements Adaboost based on C++ backend code,, 2016. Available from: https://github.com/souravc83/fastAdaboost
[46]
David, M.; Evgenia, D.; Kurt, H.; Andreas, W.; Friedrich, L.; Chih-Chung, C.; Chih-Chen, L. Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier., 2019. Available from: https://anaconda.org/bioconda/r-e1071/files?version=
[47]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol. Ther. Nucleic Acids, 2019, 16, 733-744.
[http://dx.doi.org/10.1016/j.omtn.2019.04.019] [PMID: 31146255]
[48]
Vacic, V.; Iakoucheva, L.M.; Radivojac, P. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics, 2006, 22(12), 1536-1537.
[http://dx.doi.org/10.1093/bioinformatics/btl151] [PMID: 16632492]
[49]
Manavalan, B.; Hasan, M.M.; Basith, S.; Gosu, V.; Shin, T.H.; Lee, G. Empirical comparison and analysis of web-based DNA N4-methylcytosine site prediction tools. Mol. Ther. Nucleic Acids, 2020, 22, 406-420.
[http://dx.doi.org/10.1016/j.omtn.2020.09.010] [PMID: 33230445]
[50]
Hasan, M.M.; Manavalan, B.; Khatun, M.S.; Kurata, H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int. J. Biol. Macromol., 2020, 157, 752-758.
[http://dx.doi.org/10.1016/j.ijbiomac.2019.12.009] [PMID: 31805335]
[51]
Charoenkwan, P.; Yana, J.; Nantasenamat, C.; Hasan, M.M.; Shoombuatong, W. iUmami-SCM: A novel sequence-based predictor for prediction and analysis of umami peptides using a scoring card method with propensity scores of dipeptides. J. Chem. Inf. Model., 2020, 60(12), 6666-6678.
[http://dx.doi.org/10.1021/acs.jcim.0c00707] [PMID: 33094610]
[52]
Hasan, M.M.; Basith, S.; Khatun, M.S.; Lee, G.; Manavalan, B.; Kurata, H. Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform., 2020.bbaa202
[53]
Basith, S.; Manavalan, B.; Hwan Shin, T.; Lee, G. Machine intelligence in peptide therapeutics: A next-generation tool for rapid disease screening. Med. Res. Rev., 2020, 40(4), 1276-1314.
[http://dx.doi.org/10.1002/med.21658] [PMID: 31922268]
[54]
Chen, J.; Zhao, J.; Yang, S.; Chen, Z.; Zhang, Z. Prediction of protein ubiquitination sites in Arabidopsis thaliana. Curr. Bioinform., 2019, 14(7), 614-620.
[http://dx.doi.org/10.2174/1574893614666190311141647]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 22
ISSUE: 2
Year: 2021
Published on: 18 February, 2021
Page: [122 - 136]
Pages: 15
DOI: 10.2174/1389202922666210219114211
Price: $65

Article Metrics

PDF: 243
HTML: 2
EPUB: 1
PRC: 1