iHyd-PseAAC (EPSV): Identifying Hydroxylation Sites in Proteins by Extracting Enhanced Position and Sequence Variant Feature via Chou's 5- Step Rule and General Pseudo Amino Acid Composition

Author(s): Asma Ehsan* , Muhammad K. Mahmood , Yaser D. Khan , Omar M. Barukab , Sher A. Khan , Kuo-Chen Chou .

Journal Name: Current Genomics

Volume 20 , Issue 2 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: In various biological processes and cell functions, Post Translational Modifications (PTMs) bear critical significance. Hydroxylation of proline residue is one kind of PTM, which occurs following protein synthesis. The experimental determination of hydroxyproline sites in an uncharacterized protein sequence requires extensive, time-consuming and expensive tests.

Methods: With the torrential slide of protein sequences produced in the post-genomic age, certain remarkable computational strategies are desired to overwhelm the issue. Keeping in view the composition and sequence order effect within polypeptide chains, an innovative in-silico predictor via a mathematical model is proposed.

Results: Later, it was stringently verified using self-consistency, cross-validation and jackknife tests on benchmark datasets. It was established after a rigorous jackknife test that the new predictor values are superior to the values predicted by previous methodologies.

Conclusion: This new mathematical technique is the most appropriate and encouraging as compared with the existing models.

Keywords: PseAAC, Hydroxylation of proline, Post Translational Modifications (PTMs), Sequence-coupling model, Mammalian proteins, Hydroxyproline.

[1]
Colgrave, M.L.; Peter, G.A., and; Jones, A. Hydroxyproline quantification for the estimation of collagen in tissue using multiple reaction monitoring mass spectrometry. J. Chromatogr. A, 2008, 1212(1-2), 150-153.
[2]
Gelse, K.; Pöschl, E., and; Aigner, T. Collagens—structure, function, and biosynthesis. Adv. Drug Deliv. Rev., 2003, 55(12), 1531-1546.
[3]
Ruszczak, Zbigniew. Effect of collagen matrices on dermal wound healing. Adv. Drug Deliv. Rev., 2003, 55(12), 1595-1611.
[4]
Lee, C.H.; Singla, A., and; Lee, Y. Biomedical applications of collagen. Int. J. Pharm., 2001, 221(1-2), 1-22.
[5]
Becker, G.D.; Lawrence, A.A., and; Hackett, J. Collagen-assisted healing of facial wounds after mohs surgery. Laryngoscope, 1994, 104(10), 1267-1270.
[6]
Guszczyn, T.; Soboleweki, K. Deregulation of collagen metabolism in human stomach cancer. Pathobiology, 2004, 71(6), 308-313.
[7]
Sunila, E.S., and; Kuttan, G. A preliminary study on antimetastatic activity of Thuja occidentalis L. in mice model. Immunopharmacol. Immunotoxicol., 2006, 28(2), 269-280.
[8]
Xu, Y.; Wen, X.; Shao, X.J.; Deng, N.Y., and; Chou, K.C. iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. Int. J. Mol. Sci., 2014, 15(5), 7594-7610.
[9]
Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W., and; Chou, K.C. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1), 96-102.
[10]
Xu, Y.; Ding, J.; Wu, L.Y., and; Chou, K.C. iSNO-PseAAC: Predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS one, 2013, 8(2), e55844.
[11]
Xu, Y.; Shao, X.J.; Wu, L.Y.; Deng, N.Y., and; Chou, K.C. iSNO-AAPair: Incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins. PeerJ, 2013, 1, e171.
[12]
Jia, C.; Lin, X., and; Wang, Z. Prediction of protein s-nitrosylation sites based on adapted normal distribution bi-profile bayes and chou’s pseudo amino acid composition. Int. J. Mol. Sci., 2014, 15(1), 10410-10423.
[13]
Jia, J.; Liu, Z.; Xiao, X.; Liu, B., and; Chou, K.C. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J. Theor. Biol., 2016, 394(1), 223-230.
[14]
Jia, J.; Liu, Z.; Xiao, X.; Liu, B., and; Chou, K.C. iCar-PseCp: identify carbonylation sites in proteins by Monto Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget, 2016, 7(23), 34558-34570.
[15]
Jia, J.; Zhang, L.; Liu, Z.; Xiao, X., and; Chou, K.C. pSumo-CD: Predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics, 2016, 32(1), 3133-3141.
[16]
Khan, Y.D.; Rasool, N.; Hussain, W.; Khan, S.A., and; Chou, K.C. iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Anal. Biochem., 2018, 550(1), 109-116.
[17]
Khan, Y.D.; Rasool, N.; Hussain, W.; Khan, S.A.; Chou, K.C. iPhosY-PseAAC: Identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Mol. Biol. Rep., 2018, 550, 109-116.
[http://dx.doi.org/10.1016/j.ab.2018.04.021]
[18]
Cockman, M.E.; Webb, J.D.; Kramer, H.B.; Kessler, B.M.; Ratcliffe, P.J. Proteomics-based identification of novel factor inhibiting Hypoxia-Inducible Factor (FIH) substrates indicates widespread asparaginyl hydroxylation of ankyrin repeat domain-containing proteins. Mol. Cell. Proteomics, 2009, 8(3), 535-546.
[19]
Ang, K.S.; Lakshmanan, M.; Lee, N.R.; Lee, D.Y. Metabolic modeling of microbial community interactions for health, environmental and biotechnological applications. Curr. Genomics, 2018, 19(8), 712-722.
[20]
Berg, R.A.; Steinmann, B.; Rennard, S.I., and; Crystal, R.G. Ascorbate deficiency results in decreased collagen production: under-hydroxylation of proline leads to increased intracellular degradation. Arch. Biochem. Biophys., 1983, 226(2), 681-686.
[21]
Halme, J.; Kivirikko, K.I., and; Simons, K. Isolation and partial characterization of highly purified protocollagen proline hydroxylase. Biochim. Biophys. Acta, 1970, 198(3), 460-470.
[22]
Kivirikko, K.I., and; Prockop, D.J. Hydroxylation of proline in synthetic polypeptides with purified protocollagen hydroxylase. J. Biol. Chem., 1967, 242(18), 4007-4012.
[23]
Morgan, A.A., and; Rubenstein, E. Proline: The distribution, frequency, positioning, and common functional roles of proline and polyproline sequences in the human proteome. PLoS one, 2013, 8(1), e53785.
[24]
Yamauchi, M., and; Shiiba, M. Lysine hydroxylation and crosslinking of collagen. In: Posttranslational modifications of proteins; Humana Press: New York, 2002; pp. 277-290.
[25]
Shi, S.P.; Chen, X.; Xu, H.D., and; Qiu, J.D. PredHydroxy: Computational prediction of protein hydroxylation site locations based on the primary structure. Mol. Biosyst., 2015, 11(3), 819-825.
[26]
Wu, G.; Bazer, F.W.; Burghardt, R.C.; Johnson, G.A.; Kim, S.W.; Knabe, D.A.; Li, P.; Li, X.; McKnight, J.R.; Satterfield, M.C.; Spencer, T.E. Proline and hydroxyproline metabolism: Implications for animal and human nutrition. Amino acids, 2011, 40(4), 1053-1063.
[27]
Hayat, S.; Hayat, Q.; Alyemeni, M.N.; Wani, A.S.; Pichtel, J.; Ahmad, A. Role of proline under changing environments: A review. Plant Sig. Behav., 2012, 7(11), 1456-1466.
[28]
Yang, Z.R. Predict collagen hydroxyproline sites using support vector machines. J. Comput. Biol., 2009, 16(5), 691-702.
[29]
Hu, L.L.; Niu, S.; Huang, T.; Wang, K.; Shi, X.H., and; Cai, Y.D. Prediction and analysis of protein hydroxyproline and hydroxylysine. PLoS One, 2010, 5(12), e15917.
[30]
Qiu, W.R.; Sun, B.Q.; Xiao, X.; Xu, Z.C.; Chou, K.C. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget, 2016, 7(28), 44310.
[31]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[32]
Cheng, X.; Xiao, X.; Chou, K.C. pLoc-mPlant: Predict subcellular localization of multi-location plant proteins via incorporating the optimal GO information into general PseAAC. Mol. Biosyst., 2017, 13(1), 1722-1727.
[33]
Xiao, X.; Cheng, X.; Su, S.; Mao, Q.; Chou, K.C. pLoc-mGpos: Incorporate key gene ontology information into general PseAAC for predicting subcellular localization of gram-positive bacterial proteins. Nat. Sci., 2017, 9(1), 331-349.
[34]
Wang, J.; Li, J.; Yang, B.; Xie, R.; Marquez-Lago, T.T.; Leier, A.; Hayashida, M.; Akutsu, T.; Zhang, Y.; Chou, K.C.; Selkrig, J.; Zhou, T.; Song, J.; Lithgow, T. Bastion3: A two-layer approach for identifying type III secreted effectors using ensemble learning. Bioinformatics, 2018.
[http://dx.doi.org/10.1093/bioinformatics/xxxxx]
[35]
Chou, K.C.; Cheng, X., and; Xiao, X. pLoc-bal-mHum: predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics, 2018.
[http://dx.doi.org/10.1016/ j.ygeno.2018.08.007]
[36]
Xiao, X.; Cheng, X.; Chen, G.; Mao, Q. pLoc-bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics, 2018.
[http://dx.doi.org/10.1016/j.ygeno.2018.05.017]
[37]
Khan, Y.D.; Jamil, M.; Hussain, W.; Rasool, N.; Khan, S.A.; Chou, K.C. pSSbond-PseAAC: Prediction of disulfide bonding sites by integration of PseAAC and statistical moments. J. Theor. Biol., 2019, 463(1), 47-55.
[38]
Jia, J.; Li, X.; Qiu, W.; Xiao, X., and; Chou, K.C. iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. J. Theor. Biol., 2019, 460(1), 195-203.
[39]
Chen, J.; Liu, H.; Yang, J., and; Chou, K.C. Prediction of linear b-cell epitopes using amino acid pair antigenicity scale. Amino Acids, 2007, 33(1), 423-428.
[40]
Ehsan, A.; Mahmood, K.; Khan, Y.D.; Khan, S.A., and; Chou, K.C. A novel modeling in mathematical biology forclassification of signal peptides. Sci. Reports., 2018, 8(1), 1039.
[41]
Chou, K.C. Prediction of protein signal sequences and their cleavage sites. Proteins : Struct., Funct., Genet., 2001, 42, 136-139.
[42]
Chou, K.C. Using subsite coupling to predict signal peptides. Protein Eng., 2001, 14(1), 75-79.
[43]
Chou, K.C. Prediction of signal peptides using scaled window. Peptides, 2001, 22(1), 1973-1979.
[44]
Cheng, X.; Xiao, X., and; Chou, K.C. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics, 2018, 110(1), 231-239.
[45]
Cheng, X.; Zhao, S.G.; Xiao, X., and; Chou, K.C. iATC-mISF: A multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics, 2017, 33(3), 341-346.
[46]
Qiu, W.R.; Sun, B.Q.; Xiao, X.; Xu, Z.C., and; Chou, K.C. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics, 2016, 32(1), 3116-3123.
[47]
Chou, K.C. Some remarks on predicting multi-label attributes in molecular biosystems. Mol. BioSyst., 2013, 9, 1092-1100.
[48]
Chou, K.C. Graphic rule for drug metabolism systems. Curr. Drug Metab., 2010, 11(1), 369-378.
[49]
Chou, K.C.; Lin, W.Z., and; Xiao, X. Wenxiang: A web-server for drawing wenxiang diagrams. Nat. Sci., 2011, 3(1), 862.
[50]
Wu, Z.C.; Xiao, X., and; Chou, K.C. 2d-mh: A web-server for generating graphic representation of protein sequences basedon the physicochemical properties of their constituent amino acids. J. Theor. Biol., 2010, 267, 29-34.
[51]
Davis, J.; Goadrich, M. The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning; ACM, 2006; pp. 233-240.
[52]
Chou, K.C.; Shen, H.B. Recent advances in developing web-servers for predicting protein attributes. Nat. Sci., 2009, 1(1), 63-92.
[53]
Chou, K.C. Impacts of bioinformatics to medicinal chemistry. Med. Chem., 2015, 11(1), 218-234.
[54]
Chou, K.C. An unprecedented revolution in medicinal chemistry driven by the progress of biological science. Curr. Top. Med. Chem., 2017, 17(1), 2337-2358.
[55]
Lu, C.T.; Huang, K.Y.; Su, M.G.; Lee, T.Y.; Bretana, N.A.; Chang, W.C.; Chen, Y.J.; Chen, Y.J., and; Huang, H.D. Dbptm 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Res., 2012, 41(1), 295-305.
[56]
Tanford, C. Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. J. Am. Chem. Soc., 1962, 84(1), 4240-4247.
[57]
Hopp, T.P., and; Woods, K.R. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci., 1981, 78(1), 3824-3828.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 20
ISSUE: 2
Year: 2019
Page: [124 - 133]
Pages: 10
DOI: 10.2174/1389202920666190325162307
Price: $58

Article Metrics

PDF: 27
HTML: 2