iHyd-LysSite (EPSV): Identifying Hydroxylysine Sites in Protein Using Statistical Formulation by Extracting Enhanced Position and Sequence Variant Feature Technique

Author(s): Muhammad Khalid Mahmood, Asma Ehsan*, Yaser Daanial Khan, Kuo-Chen Chou

Journal Name: Current Genomics

Volume 21 , Issue 7 , 2020

Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Introduction: Hydroxylation is one of the most important post-translational modifications (PTM) in cellular functions and is linked to various diseases. The addition of one of the hydroxyl groups (OH) to the lysine sites produces hydroxylysine when undergoes chemical modification.

Methods: The method which is used in this study for identifying hydroxylysine sites based on powerful mathematical and statistical methodology incorporating the sequence-order effect and composition of each object within protein sequences. This predictor is called "iHyd-LysSite (EPSV)" (identifying hydroxylysine sites by extracting enhanced position and sequence variant technique). The prediction of hydroxylysine sites by experimental methods is difficult, laborious and highly expensive. In silico technique is an alternative approach to identify hydroxylysine sites in proteins.

Results: The experimental results require that the predictive model should have high sensitivity and specificity values and must be more accurate. The self-consistency, independent, 10-fold crossvalidation and jackknife tests are performed for validation purposes. These tests are resulted by using three renowned classifiers, Neural Networks (NN), Random Forest (RF) and Support Vector Machine (SVM) with the demanding prediction rate. The overall predictive outcomes are extraordinarily superior to the results obtained by previous predictors. The proposed model contributed an excellent prediction rate in the system for NN, RF, and SVM classifiers. The sensitivity and specificity results using all these classifiers for jackknife test are 96.08%, 94.99%, 98.16% and 97.52%, 98.52%, 80.95%.

Conclusion: The results obtained by the proposed tool show that this method may meet the future demand of hydroxylysine sites with a better prediction rate over the existing methods.

Keywords: Hydroxylysine, PTMs, ANN, cross-validation, predictive model, post-translational modifications.

Xie, H.; Vucetic, S.; Iakoucheva, L.M.; Oldfield, C.J.; Dunker, A.K.; Obradovic, Z.; Uversky, V.N. Functional anthology of intrinsic disorder. 3. Ligands, post-translational modifications, and diseases associated with intrinsically disordered proteins. J. Proteome Res., 2007, 6(5), 1917-1932.
[http://dx.doi.org/10.1021/pr060394e] [PMID: 17391016]
Kaelin, W.G.; William, G. Proline hydroxylation and gene expression. Annu. Rev. Biochem., 2005, 74, 115-128.
[http://dx.doi.org/10.1146/annurev.biochem.74.082803.133142] [PMID: 15952883]
Chopra, R.K.; Ananthanarayanan, V.S. Conformational implications of enzymatic proline hydroxylation in collagen. Proc. Natl. Acad. Sci. USA, 1982, 79(23), 7180-7184.
[http://dx.doi.org/10.1073/pnas.79.23.7180] [PMID: 6296823]
Berra, E.; Ginouvès, A.; Pouysségur, J. The hypoxia-inducible-factor hydroxylases bring fresh air into hypoxia signalling. EMBO Rep., 2006, 7(1), 41-45.
[http://dx.doi.org/10.1038/sj.embor.7400598] [PMID: 16391536]
Salnikow, K.; Kasprzak, K.S. Ascorbate depletion: a critical step in nickel carcinogenesis? Environ. Health Perspect., 2005, 113(5), 577-584.
[http://dx.doi.org/10.1289/ehp.7605] [PMID: 15866766 ]
Yamauchi, M.; Shiiba, M. Lysine hydroxylation and cross-linking of collagen. Methods Mol. Biol., 2008, 446, 95-108.
[http://dx.doi.org/10.1007/978-1-60327-084-7_7] [PMID: 18373252]
Richards, A.A.; Stephens, T.; Charlton, H.K.; Jones, A.; Macdonald, G.A.; Prins, J.B.; Whitehead, J.P. Adiponectin multimerization is dependent on conserved lysines in the collagenous domain: evidence for regulation of multimerization by alterations in posttranslational modifications. Mol. Endocrinol., 2006, 20(7), 1673-1687.
[http://dx.doi.org/10.1210/me.2005-0390] [PMID: 16497731]
Xu, Y.; Wen, X.; Shao, X.J.; Deng, N.Y.; Chou, K.C. iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. Int. J. Mol. Sci., 2014, 15(5), 7594-7610.
[http://dx.doi.org/10.3390/ijms15057594] [PMID: 24857907]
Cockman, M.E.; Webb, J.D.; Kramer, H.B.; Kessler, B.M.; Ratcliffe, P.J. Proteomics-based identification of novel factor inhibiting hypoxia-inducible factor (FIH) substrates indicates widespread asparaginyl hydroxylation of ankyrin repeat domain-containing proteins. Mol. Cell. Proteomics, 2009, 8(3), 535-546.
[http://dx.doi.org/10.1074/mcp.M800340-MCP200] [PMID: 18936059]
Hu, L.L.; Niu, S.; Huang, T.; Wang, K.; Shi, X.H.; Cai, Y.D. Lysine hydroxylation and cross-linking of collagen. Methods Mol. Biol., 2010, 446, 95-108.
Akmal, M.A.; Rasool, N.; Khan, Y.D. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS One, 2017, 12(8), e0181966.
Butt, A.H.; Khan, Y.D. Prediction of S-Sulfenylation sites using statistical moments based features via Chou’S 5-Step rule. Int. J. Pept. Res. Ther., 2019, 2019, 1-11.
Malebary, S.J.; Rehman, M.S.U.; Khan, Y.D. iCrotoK-PseAAC: Identify lysine crotonylation sites by blending position relative statistical features according to the Chou’s 5-step rule. PLoS One, 2019, 14(11), e0223993.
[http://dx.doi.org/10.1371/journal.pone.0223993] [PMID: 31751380]
Khan, S.A.; Khan, Y.D.; Ahmad, S.; Allehaibi, K.H. N-MyristoylG-PseAAC: sequence-based prediction of N-myristoyl glycine sites in proteins by integration of PseAAC and statistical moments. Lett. Org. Chem., 2019, 16(3), 226-234.
Liu, Y.; Wang, M.; Xi, J.; Luo, F.; Li, A. PTM-ssMP: a web server for predicting different types of post-translational modification sites using novel site-specific modification profile. Int. J. Biol. Sci., 2018, 14(8), 946-956.
[http://dx.doi.org/10.7150/ijbs.24121] [PMID: 29989096]
Basu, S.; Plewczynski, D. AMS 3.0: prediction of post-translational modifications. BMC Bioinformatics, 2010, 11(1), 210.
[http://dx.doi.org/10.1186/1471-2105-11-210] [PMID: 20423529]
Qiu, W.R.; Sun, B.Q.; Xiao, X.; Xu, Z.C.; Chou, K.C. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget, 2016, 7(28), 44310.
Hasan, M.M.; Rashid, M.M.; Khatun, M.S.; Kurata, H. Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information. Sci. Rep., 2019, 9(1), 8258.
[http://dx.doi.org/10.1038/s41598-019-44548-x] [PMID: 31164681]
Hasan, M.M.; Guo, D.; Kurata, H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. Mol. Biosyst., 2017, 13(12), 2545-2550.
[http://dx.doi.org/10.1039/C7MB00491E] [PMID: 28990628]
Hasan, M.M.; Khatun, M.S.; Kurata, H. Large-scale assessment of bioinformatics tools for lysine succinylation sites. Cells, 2019, 8(2), 95.
[http://dx.doi.org/10.3390/cells8020095] [PMID: 30696115]
Ju, Z.; Wang, S.Y. Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via Chou’s 5-steps rule and general pseudo components. Genomics, 2020, 112(1), 859-866.
[http://dx.doi.org/10.1016/j.ygeno.2019.05.027] [PMID: 31175975]
Usman, M.; Lee, J.A. Afp-cksaap: prediction of antifreeze proteins using composition of k-spaced amino acid pairs with deep neural network. arXiv preprint , 1910.
Zhang, S.; Li, X.; Fan, C.; Wu, Z.; Liu, Q. Application of machine learning techniques to predict protein phosphorylation sites. Lett. Org. Chem., 2019, 16(4), 247-257.
Nanni, L.; Brahnam, S.; Lumini, A. Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids, 2012, 43(2), 657-665.
[http://dx.doi.org/10.1007/s00726-011-1114-9] [PMID: 21993538]
Ehsan, A.; Mahmood, K.; Khan, Y.D.; Khan, S.A.; Chou, K.C. A novel modeling in mathematical biology forclassification of signal peptides. Sci. Rep., 2018, 8(1), 1039.
[http://dx.doi.org/10.1038/s41598-018-19491-y] [PMID: 29348418]
Ehsan, A.; Mahmood, M.K.; Khan, Y.D.; Barukab, O.M.; Khan, S.A.; Chou, K.C. iHyd-PseAAC (EPSV): identifying hydroxylation sites in proteins by extracting enhanced position and sequence variant feature via Chou’s 5-step rule and general pseudo amino acid composition. Curr. Genomics, 2019, 20(2), 124-133.
[http://dx.doi.org/10.2174/1389202920666190325162307] [PMID: 31555063]
Chou, K.C. Prediction of protein signal sequences and their cleavage sites. Proteins, 2001, 42(1), 136-139.
[http://dx.doi.org/10.1002/1097-0134(20010101)42:1<136::AID-PROT130>3.0.CO;2-F] [PMID: 11093267]
Chou, K.C.; Wu, Z.C.; Xiao, X. iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. Biosyst., 2012, 8(2), 629-641.
[http://dx.doi.org/10.1039/C1MB05420A] [PMID: 22134333]
Chou, K.C. Some remarks on predicting multi-label attributes in molecular biosystems. Mol. Biosyst., 2013, 9(6), 1092-1100.
[http://dx.doi.org/10.1039/c3mb25555g] [PMID: 23536215]
Li, S.; Li, H.; Li, M.; Shyr, Y.; Xie, L.; Li, Y. Improved prediction of lysine acetylation by support vector machines. Protein Pept. Lett., 2009, 16(8), 977-983.
Shi, M.G.; Huang, D.S.; Li, X.L. A protein interaction network analysis for yeast integral membrane protein. Protein Pept. Lett., 2008, 15(7), 692-699.
[http://dx.doi.org/10.2174/092986608785133627] [PMID: 18782064]
Zhou, X.B.; Chen, C.; Li, Z.C.; Zou, X.Y. Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J. Theor. Biol., 2007, 248(3), 546-551.
[http://dx.doi.org/10.1016/j.jtbi.2007.06.001] [PMID: 17628605]
Salvatore, M.; Shu, N.; Elofsson, A. The SubCons webserver: a user friendly web interface for state of the art subcellular localization prediction. Prot Sci., 2018, 27, 195-201.
van Zundert, G.C.P.; Rodrigues, J.P.G.L.M.; Trellet, M.; Schmitz, C.; Kastritis, P.L.; Karaca, E.; Melquiond, A.S.J.; van Dijk, M.; de Vries, S.J.; Bonvin, A.M.J.J. The HADDOCK2. 2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol., 2016, 428(4), 720-725.
[http://dx.doi.org/10.1016/j.jmb.2015.09.014] [PMID: 26410586]
Ghouzam, Y.; Postic, G.; Guerin, P.E.; de Brevern, A.G.; Gelly, J.C. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci. Rep., 2016, 6(1), 28268.
[http://dx.doi.org/10.1038/srep28268] [PMID: 27319297 ]
Wang, D.; Liu, D.; Yuchi, J.; He, F.; Jiang, Y.; Cai, S.; Li, J.; Xu, D. MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization. Nucleic Acids Res., 2020, 48(W1), W140-W146.
[http://dx.doi.org/10.1093/nar/gkaa275] [PMID: 32324217]
Gnanavel, M.; Mehrotra, P.; Rakshambikai, R.; Martin, J.; Srinivasan, N.; Bhaskara, R.M. CLAP: a web-server for automatic classification of proteins with special reference to multi-domain proteins. BMC Bioinformatics, 2014, 15(1), 343.
Weng, G.; Wang, E.; Wang, Z.; Liu, H.; Zhu, F.; Li, D.; Hou, T. HawkDock: a web server to predict and analyze the protein-protein complex based on computational docking and MM/GBSA. Nucleic Acids Res., 2019, 47(W1), W322-W330.
[http://dx.doi.org/10.1093/nar/gkz397] [PMID: 1106357]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Published on: 21 October, 2020
Page: [536 - 545]
Pages: 10
DOI: 10.2174/1389202921999200831142629
Price: $65

Article Metrics

PDF: 21
PRC: 1