Generic placeholder image

Current Bioinformatics


ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Identification of Lysine Carboxylation Sites in Proteins by Integrating Statistical Moments and Position Relative Features via General PseAAC

Author(s): Saba Amanat, Adeel Ashraf, Waqar Hussain, Nouman Rasool and Yaser D. Khan*

Volume 15, Issue 5, 2020

Page: [396 - 407] Pages: 12

DOI: 10.2174/1574893614666190723114923

Price: $65


Background: Carboxylation is one of the most biologically important post-translational modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent and biologically important type of carboxylation. For studying such biological functions, it is essential to correctly determine the lysine sites sensitive to carboxylation.

Objective: Herein, we present a computational model for the prediction of the carboxylysine site which is based on machine learning.

Methods: Various position and composition relative features have been incorporated into the Pse- AAC for construction of feature vectors and a neural network is employed as a classifier. The model is validated by jackknife, cross-validation, self-consistency, and independent testing.

Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp, 99.76% Sp, and 0.99 MCC.Using the jackknife method, prediction model validation gave 97.07% Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.

Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed model has better performance as compared to the existing model PreLysCar; however, the accuracy can be improved further, in the future, due to the increasing number of carboxylysine sites in proteins.

Keywords: Carboxylation, carboxylysine, statistical moments, PseAAC, 5-step rule, lysine.

Graphical Abstract
Whitford D. Proteins: structure and function. John Wiley & Sons 2013.
Dementin S, Bouhss A, Auger G, et al. Evidence of a functional requirement for a carbamoylated lysine residue in MurD, MurE and MurF synthetases as established by chemical rescue experiments. Eur J Biochem 2001; 268(22): 5800-7.
[] [PMID: 11722566]
Golemi D, Maveyraud L, Vakulenko S, Samama J-P, Mobashery S. Critical involvement of a carbamylated lysine in catalytic function of class D β-lactamases. Proc Natl Acad Sci USA 2001; 98(25): 14280-5.
[] [PMID: 11724923]
Stec B. Structural mechanism of RuBisCO activation by carbamylation of the active site lysine. Proc Natl Acad Sci USA 2012; 109(46): 18785-90.
Tauber H. The carboxylase enzyme system. J Biol Chem 1938; 125: 191-9.
Che T, Bonomo RA, Shanmugam S, et al. Carboxylation and decarboxylation of active site Lys 84 controls the activity of OXA-24 β-lactamase of Acinetobacter baumannii: Raman crystallographic and solution evidence. J Am Chem Soc 2012; 134(27): 11206-15.
[] [PMID: 22702961]
Park I-S, Hausinger RP. Requirement of carbon dioxide for in vitro assembly of the urease nickel metallocenter. Science 1995; 267(5201): 1156-8.
[] [PMID: 7855593]
Lorimer GH, Badger MR, Andrews TJ. The activation of ribulose-1,5-bisphosphate carboxylase by carbon dioxide and magnesium ions. Equilibria, kinetics, a suggested mechanism, and physiological implications. Biochemistry 1976; 15(3): 529-36.
[] [PMID: 3199]
Wu D, Hu T, Zhang L, et al. Residues Asp164 and Glu165 at the substrate entryway function potently in substrate orientation of alanine racemase from E. coli: Enzymatic characterization with crystal structure analysis. Protein Sci 2008; 17(6): 1066-76.
[] [PMID: 18434499]
Garman EF. In Advancing Methods for Biomolecular Crystallography. Springer 2013; pp. 69-77.
Ravelli RB, McSweeney SM. The ‘fingerprint’ that X-rays can leave on structures. Structure 2000; 8(3): 315-28.
[] [PMID: 10745008]
Smyth MS, Martin JH. X ray crystallography. Mol Pathol 2000; 53(1): 8-14.
[] [PMID: 10884915]
Gao J, Zhang N, Ruan J. Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information. Comput Biol Chem 2013; 47: 215-20.
[] [PMID: 24184705]
Jimenez-Morales D, Adamian L, Shi D, Liang J. Lysine carboxylation: unveiling a spontaneous post-translational modification. Acta Crystallogr D Biol Crystallogr 2014; 70(Pt 1): 48-57.
[] [PMID: 24419378]
Akmal MA, Rasool N, Khan YD. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS One 2017; 12(8) e0181966
[] [PMID: 28797096]
Butt AH, Khan SA, Jamil H, Rasool N, Khan YD. A prediction model for membrane proteins using moments based features. BioMed Res Int 2016; 2016 8370132
Butt AH, Rasool N, Khan YD. A treatise to computational approaches towards prediction of membrane protein and its subtypes. J Membr Biol 2017; 250(1): 55-76.
[] [PMID: 27866233]
Butt AH, Rasool N, Khan YD. Predicting membrane proteins and their types by extracting various sequence features into Chou’s general PseAAC. Mol Biol Rep 2018; 45(6): 2295-306.
[] [PMID: 30238411]
Khan YD, Ahmed F, Khan SA. Situation recognition using image moments and recurrent neural networks. Neural Comput Appl 2014; 24(7-8): 1519-29.
Khan YD, Khan NS, Farooq S, et al. An efficient algorithm for recognition of human actions. Scientific World Journal 2014; 2014: Article ID 875879.
Khan YD, Khan SA, Ahmad F, Islam S. Iris recognition using image moments and k-means algorithm. The Scientific World Journal 2014; 2014: Article ID 723595.
Khan YD, Rasool N, Hussain W, Khan SA, Chou K-C. iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Anal Biochem 2018; 550: 109-16.
[] [PMID: 29704476]
Khan YD, Rasool N, Hussain W, Khan SA, Chou K-C. iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Mol Biol Rep 2018; 45(6): 2501-9.
[] [PMID: 30311130]
Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 2011; 273(1): 236-47.
[] [PMID: 21168420]
Chen W, Ding H, Zhou X, Lin H, Chou K-C. iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562: 59-65.
[] [PMID: 30201554]
Cheng X, Lin W-Z, Xiao X, Chou K-C, Hancock J. pLoc_bal-mAnimal: predict subcellular localization of animal proteins by balancing training dataset and PseAAC. Bioinformatics 2018; 1: 9.
[PMID: 30010789]
Cheng X, Xiao X, Chou K-C. pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC. J Theor Biol 2018; 458: 92-102.
[] [PMID: 30201434]
Xiao X, Cheng X, Chen G, Mao Q, Chou K-C. pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics 2019; 111(4): 886-92.
[PMID: 29842950]
Chou K-C, Cheng X, Xiao X. pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics 2019; 111(6): 1274-82.
[PMID: 30179658]
Sankari ES, Manimegalai D. Predicting membrane protein types by incorporating a novel feature set into Chou’s general PseAAC. J Theor Biol 2018; 455: 319-28.
[] [PMID: 30056084]
Contreras-Torres E. Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou’s PseAAC. J Theor Biol 2018; 454: 139-45.
[] [PMID: 29870696]
Javed F, Hayat M. Predicting subcellular localizations of multi-label proteins by incorporating the sequence features into Chou’s PseAAC. Genomics 2019; 111(6): 1325-32.
[PMID: 30196077]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012; 28(23): 3150-2.
[] [PMID: 23060610]
Chou K-C. Using subsite coupling to predict signal peptides. Protein Eng 2001; 14(2): 75-9.
[] [PMID: 11297664]
Khan YD, Ahmad F, Anwar MW. A neuro-cognitive approach for iris recognition using back propagation. World Appl Sci J 2012; 16(5): 678-85.
Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C. SPalmitoylC-PseAAC: A sequence-based model developed via Chou’s 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins. Anal Biochem 2019; 568: 14-23.
[PMID: 30593778]
Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C. SPrenylC-PseAAC: A sequence-based model developed via Chou’s 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins. J Theor Biol 2019; 468: 1-11.
[] [PMID: 30768975]
Khan YD, Jamil M, Hussain W, Rasool N, Khan SA, Chou K-C. pSSbond-PseAAC: Prediction of disulfide bonding sites by integration of PseAAC and statistical moments. J Theor Biol 2019; 463: 47-55.
[PMID: 30550863]
Chou K-C. Prediction of signal peptides using scaled window. peptides 2001; 22(12): 1973-9.
Feng P-M, Ding H, Chen W, Lin H. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med 2013; 2013 530696
Xu Y, Shao XJ, Wu LY, Deng NY, Chou KC. iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ 2013; 1 e171
[] [PMID: 24109555]
Chen W, Feng P, Ding H, Lin H, Chou K-C. Using deformation energy to analyze nucleosome positioning in genomes. Genomics 2016; 107(2-3): 69-75.
[] [PMID: 26724497]
Qiu WR, Sun BQ, Xiao X, Xu D, Chou KC. iPhos-PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory. Mol Inform 2017; 36(5-6)
[] [PMID: 28488814]
Xiao X, Ye H-X, Liu Z, Jia J-H, Chou K-C. iROS-gPseKNC: Predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget 2016; 7(23): 34180-9.
[] [PMID: 27147572]
Lin H, Deng EZ, Ding H, Chen W, Chou KC. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 2014; 42(21): 12961-72.
[] [PMID: 25361964]
Xu Y, Wen X, Wen LS, Wu LY, Deng NY, Chou KC. iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition. PLoS One 2014; 9(8) e 105018
[] [PMID: 25121969]
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 2016; 394: 223-30.
[] [PMID: 26807806]
Zhang CJ, Tang H, Li WC, Lin H, Chen W, Chou KC. iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2016; 7(43): 69783-93.
[] [PMID: 27626500]
Chen W, Ding H, Feng P, Lin H, Chou KC. iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget 2016; 7(13): 16895-909.
[] [PMID: 26942877]
Liu B, Yang F, Chou KC. 2L-piRNA: A two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol Ther Nucleic Acids 2017; 7: 267-77.
[] [PMID: 28624202]
Liu B, Wang S, Long R, Chou KC. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 2017; 33(1): 35-41.
[] [PMID: 27531102]
Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 2017; 8(3): 4208-17.
[] [PMID: 27926534]
Feng P, Ding H, Yang H, Chen W, Lin H, Chou KC. iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol Ther Nucleic Acids 2017; 7: 155-63.
[] [PMID: 28624191]
Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2018; 34(1): 33-40.
[] [PMID: 28968797]
Ehsan A, Mahmood K, Khan YD, Khan SA, Chou KC. A Novel Modeling in Mathematical Biology for Classification of Signal Peptides. Sci Rep 2018; 8(1): 1039.
[] [PMID: 29348418]
Feng P, Yang H, Ding H, Lin H, Chen W, Chou KC. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 2019; 111(1): 96-102.
[] [PMID: 29360500]
Chou K-C, Wu Z-C, Xiao X. iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol Biosyst 2012; 8(2): 629-41.
[] [PMID: 22134333]
Lin W-Z, Fang J-A, Xiao X, Chou K-C. iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol Biosyst 2013; 9(4): 634-44.
[] [PMID: 23370050]
Xiao X, Wu Z-C, Chou K-C. iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J Theor Biol 2011; 284(1): 42-51.
[] [PMID: 21684290]
Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 2013; 436(2): 168-77.
[] [PMID: 23395824]
Chou K-C. Some remarks on predicting multi-label attributes in molecular biosystems. Mol Biosyst 2013; 9(6): 1092-100.
[] [PMID: 23536215]
Chou K-C, Zhang C-T. Prediction of protein structural classes. Crit Rev Biochem Mol Biol 1995; 30(4): 275-349.
[] [PMID: 7587280]
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. J Theor Biol 2015; 364: 284-94.
[] [PMID: 25264267]
Dou Y, Yao B, Zhang C, Phospho SVM. PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 2014; 46(6): 1459-69.
[] [PMID: 24623121]
Feng K-Y, Cai Y-D, Chou K-C. Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Commun 2005; 334(1): 213-7.
[] [PMID: 15993842]
Kumar R, Srivastava A, Kumari B, Kumar M. Prediction of β -lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 2015; 365: 96-103.
[] [PMID: 25454009]
Mondal S, Pai PP. Chou’s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol 2014; 356: 30-5.
[] [PMID: 24732262]
Nanni L, Brahnam S, Lumini A. Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J Theor Biol 2014; 360: 109-16.
[] [PMID: 25026218]
Qiu W-R, Xiao X, Chou K-C. iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 2014; 15(2): 1746-66.
[] [PMID: 24469313]
Shen H-B, Yang J, Chou K-C. Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 2007; 33(1): 57-67.
[] [PMID: 17235453]
Wu Z-C, Xiao X, Chou K-C. iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. Mol Biosyst 2011; 7(12): 3287-97.
[] [PMID: 21984117]
Zhou GP, Doctor K. Subcellular location prediction of apoptosis proteins. Proteins 2003; 50(1): 44-8.
[] [PMID: 12471598]
Cheng X, Xiao X, Chou K-C. pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. Mol Biosyst 2017; 13(9): 1722-7.
[] [PMID: 28702580]
Cheng X, Xiao X, Chou K-C. pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. Gene 2017; 628: 315-21.
[] [PMID: 28728979]
Cheng X, Zhao S-G, Lin W-Z, Xiao X, Chou K-C. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics 2017; 33(22): 3524-31.
[] [PMID: 29036535]
Cheng X, Zhao S-G, Xiao X, Chou K-C. iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics 2017; 33(3): 341-6.
[] [PMID: 28172617]
Qiu W-R, Jiang S-Y, Xu Z-C, Xiao X, Chou K-C. iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget 2017; 8(25): 41178-88.
[] [PMID: 28476023]
Chou KC, Shen HB. Recent advances in developing web-servers for predicting protein attributes. Nat Sci 2009; 1: 63-92.
He W, Jia C, Zou Q. 4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction. Bioinformatics 2019; 35(4): 593-601.
[] [PMID: 30052767]
Jiang J, Xing F, Zeng X, Zou Q, Ricyer DB, Ricyer DB. A Database For Collecting Rice Yield-related Genes with Biological Analysis. Int J Biol Sci 2018; 14(8): 965-70.
[] [PMID: 29989091]
Yang H, Lv H, Ding H, Chen W, Lin H. iRNA-2OM: A Sequence- Based Predictor for Identifying 2'-O-Methylation Sites in Homo sapiens. Journal of computational biology : a journal of computational molecular cell biology 2018; 25(11): 1266-77.
Liang Z-Y, Lai H-Y, Yang H, et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics 2017; 33(3): 467-9.
[PMID: 28171531]
Chou KC. Impacts of bioinformatics to medicinal chemistry. Med Chem 2015; 11(3): 218-34.
[] [PMID: 25548930]
Chou KC. An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr Top Med Chem 2017; 17(21): 2337-58.
[] [PMID: 28413951]

Rights & Permissions Print Export Cite as
© 2022 Bentham Science Publishers | Privacy Policy