Computational Identification of Lysine Glutarylation Sites Using Positive- Unlabeled Learning

Author(s): Zhe Ju*, Shi-Yun Wang

Journal Name: Current Genomics

Volume 21 , Issue 3 , 2020

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: As a new type of protein acylation modification, lysine glutarylation has been found to play a crucial role in metabolic processes and mitochondrial functions. To further explore the biological mechanisms and functions of glutarylation, it is significant to predict the potential glutarylation sites. In the existing glutarylation site predictors, experimentally verified glutarylation sites are treated as positive samples and non-verified lysine sites as the negative samples to train predictors. However, the non-verified lysine sites may contain some glutarylation sites which have not been experimentally identified yet.

Methods: In this study, experimentally verified glutarylation sites are treated as the positive samples, whereas the remaining non-verified lysine sites are treated as unlabeled samples. A bioinformatics tool named PUL-GLU was developed to identify glutarylation sites using a positive-unlabeled learning algorithm.

Results: Experimental results show that PUL-GLU significantly outperforms the current glutarylation site predictors. Therefore, PUL-GLU can be a powerful tool for accurate identification of protein glutarylation sites.

Conclusion: A user-friendly web-server for PUL-GLU is available at http://bioinform.cn/pul_glu/.

Keywords: Post-translational modification, glutarylation, support vector machine, positive-unlabeled learning, protein acylation, site predictors.

[1]
Chen, Y.; Sprung, R.; Tang, Y.; Ball, H.; Sangras, B.; Kim, S.C.; Falck, J.R.; Peng, J.; Gu, W.; Zhao, Y. Lysine propionylation and butyrylation are novel post-translational modifications in histones. Mol. Cell. Proteomics, 2007, 6(5), 812-819.
[http://dx.doi.org/10.1074/mcp.M700021-MCP200] [PMID: 17267393]
[2]
Tan, M.; Luo, H.; Lee, S.; Jin, F.; Yang, J.S.; Montellier, E.; Buchou, T.; Cheng, Z.; Rousseaux, S.; Rajagopal, N.; Lu, Z.; Ye, Z.; Zhu, Q.; Wysocka, J.; Ye, Y.; Khochbin, S.; Ren, B.; Zhao, Y. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell, 2011, 146(6), 1016-1028.
[http://dx.doi.org/10.1016/j.cell.2011.08.008] [PMID: 21925322]
[3]
Zhang, Z.; Tan, M.; Xie, Z.; Dai, L.; Chen, Y.; Zhao, Y. Identification of lysine succinylation as a new post-translational modification. Nat. Chem. Biol., 2011, 7(1), 58-63.
[http://dx.doi.org/10.1038/nchembio.495] [PMID: 21151122]
[4]
Choudhary, C.; Weinert, B.T.; Nishida, Y.; Verdin, E.; Mann, M. The growing landscape of lysine acetylation links metabolism and cell signalling. Nat. Rev. Mol. Cell Biol., 2014, 15(8), 536-550.
[http://dx.doi.org/10.1038/nrm3841] [PMID: 25053359]
[5]
Dai, L.; Peng, C.; Montellier, E.; Lu, Z.; Chen, Y.; Ishii, H.; Debernardi, A.; Buchou, T.; Rousseaux, S.; Jin, F.; Sabari, B.R.; Deng, Z.; Allis, C.D.; Ren, B.; Khochbin, S.; Zhao, Y. Lysine 2-hydroxyisobutyrylation is a widely distributed active histone mark. Nat. Chem. Biol., 2014, 10(5), 365-370.
[http://dx.doi.org/10.1038/nchembio.1497] [PMID: 24681537]
[6]
Hirschey, M.D.; Zhao, Y. Metabolic regulation by lysine malonylation, succinylation, and glutarylation. Mol. Cell. Proteomics, 2015, 14(9), 2308-2315.
[http://dx.doi.org/10.1074/mcp.R114.046664] [PMID: 25717114]
[7]
Tan, M.; Peng, C.; Anderson, K.A.; Chhoy, P.; Xie, Z.; Dai, L.; Park, J.; Chen, Y.; Huang, H.; Zhang, Y.; Ro, J.; Wagner, G.R.; Green, M.F.; Madsen, A.S.; Schmiesing, J.; Peterson, B.S.; Xu, G.; Ilkayeva, O.R.; Muehlbauer, M.J.; Braulke, T.; Mühlhausen, C.; Backos, D.S.; Olsen, C.A.; McGuire, P.J.; Pletcher, S.D.; Lombard, D.B.; Hirschey, M.D.; Zhao, Y. Lysine glutarylation is a protein posttranslational modification regulated by SIRT5. Cell Metab., 2014, 19(4), 605-617.
[http://dx.doi.org/10.1016/j.cmet.2014.03.014] [PMID: 24703693]
[8]
Xie, L.; Wang, G.; Yu, Z.; Zhou, M.; Li, Q.; Huang, H.; Xie, J. Proteome-wide lysine glutarylation profiling of the Mycobacterium tuberculosis H37Rv. J. Proteome Res., 2016, 15(4), 1379-1385.
[http://dx.doi.org/10.1021/acs.jproteome.5b00917] [PMID: 26903315]
[9]
Ju, Z.; He, J.J. Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection. Anal. Biochem., 2018, 550, 1-7.
[http://dx.doi.org/10.1016/j.ab.2018.04.005] [PMID: 29641975]
[10]
Xu, Y.; Yang, Y.; Ding, J.; Li, C. iGlu-Lys: A predictor for lysine glutarylation through amino acid pair order features. IEEE Trans. Nanobioscience, 2018, 17(4), 394-401.
[http://dx.doi.org/10.1109/TNB.2018.2848673] [PMID: 29994125]
[11]
Huang, K.Y.; Kao, H.J.; Hsu, J.B.; Weng, S.L.; Lee, T.Y. Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites. BMC Bioinformatics, 2019, 19(Suppl. 13), 384.
[http://dx.doi.org/10.1186/s12859-018-2394-9] [PMID: 30717647]
[12]
Al-Barakati, H.J.; Saigo, H.; Newman, R.H.; Kc, D.B. RF-GlutarySite: a random forest based predictor for glutarylation sites. Mol Omics, 2019, 15(3), 189-204.
[http://dx.doi.org/10.1039/C9MO00028C] [PMID: 31025681]
[13]
Wang, C.; Ding, C.; Meraz, R.F.; Holbrook, S.R. PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics, 2006, 22(21), 2590-2596.
[http://dx.doi.org/10.1093/bioinformatics/btl441] [PMID: 16945945]
[14]
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[15]
Du, X.; Diao, Y.; Liu, H.; Li, S. MsDBP: Exploring DNA-binding proteins by integrating multiscale sequence information via chou’s five-step rule. J. Proteome Res., 2019, 18(8), 3119-3132.
[http://dx.doi.org/10.1021/acs.jproteome.9b00226] [PMID: 31267738]
[16]
Kabir, M.; Ahmad, S.; Iqbal, M.; Hayat, M. iNR-2L: A two-level sequence-based predictor developed via Chou’s 5-steps rule and general PseAAC for identifying nuclear receptors and their families. Genomics, 2019, 112(1), 276-285.
[http://dx.doi.org/10.1016/j.ygeno.2019.02.006]
[17]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[18]
Chou, K.C. Advance in predicting subcellular localization of multi-label proteins and its implication for developing multi-target drugs. Curr. Med. Chem., 2019, 26, 4918-4943.
[http://dx.doi.org/10.2174/0929867326666190507082559] [PMID: 31060481]
[19]
Chou, K.C. Impacts of pseudo amino acid components and 5-steps rule to proteomics and proteome analysis. Curr. Top. Med. Chem., 2019, 19(25), 2283-2300.
[http://dx.doi.org/10.2174/1568026619666191018100141]
[20]
Li, W.; Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22(13), 1658-1659.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID: 16731699]
[21]
Atchley, W.R.; Zhao, J.; Fernandes, A.D.; Drüke, T. Solving the protein sequence metric problem. Proc. Natl. Acad. Sci. USA, 2005, 102(18), 6395-6400.
[http://dx.doi.org/10.1073/pnas.0408677102] [PMID: 15851683]
[22]
Sagara, J.I.; Shimizu, S.; Kawabata, T.; Nakamura, S.; Ikeguchi, M.; Shimizu, K. The use of sequence comparison to detect ‘identities’ in tRNA genes. Nucleic Acids Res., 1998, 26(8), 1974-1979.
[http://dx.doi.org/10.1093/nar/26.8.1974] [PMID: 9518491]
[23]
Ju, Z.; Cao, J.Z. Prediction of protein N-formylation using the composition of k-spaced amino acid pairs. Anal. Biochem., 2017, 534, 40-45.
[http://dx.doi.org/10.1016/j.ab.2017.07.011] [PMID: 28709899]
[24]
Ju, Z.; Wang, S.Y. Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou’s general pseudo amino acid composition. Gene, 2018, 664, 78-83.
[http://dx.doi.org/10.1016/j.gene.2018.04.055] [PMID: 29694908]
[25]
Chang, C.C.; Lin, C.J. Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2011, 2, 27.
[http://dx.doi.org/10.1145/1961189.1961199]
[26]
Yu, H.; Han, J.; Chang, K.C. 2002, PEBL: positive example based learning for web page classification using svm. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, , pp. 239-248.
[http://dx.doi.org/10.1145/775047.775083]
[27]
Liu, B.; Dai, Y.; Li, X.; Lee, W.S.; Yu, P.S. Building text classifiers using positive and unlabeled examples. In: Data Mining, Third IEEE International Conference on, IEEE. 2003, pp. 179-186.
[28]
Liu, B.; Lee, W.S.; Yu, P.S.; Li, X. Partially supervised classification of text documents. ICML; Citeseer, 2002, 2, 387-394.
[29]
Zhao, X.M.; Wang, Y.; Chen, L.; Aihara, K. Gene function prediction using labeled and unlabeled data. BMC Bioinformatics, 2008, 9, 57.
[http://dx.doi.org/10.1186/1471-2105-9-57] [PMID: 18221567]
[30]
Cerulo, L.; Elkan, C.; Ceccarelli, M. Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics, 2010, 11, 228.
[http://dx.doi.org/10.1186/1471-2105-11-228] [PMID: 20444264]
[31]
Yang, P.; Li, X.L.; Mei, J.P.; Kwoh, C.K.; Ng, S.K. Positive-unlabeled learning for disease gene identification. Bioinformatics, 2012, 28(20), 2640-2647.
[http://dx.doi.org/10.1093/bioinformatics/bts504] [PMID: 22923290]
[32]
Yang, P.; Li, X.; Chua, H.N.; Kwoh, C.K.; Ng, S.K. Ensemble positive unlabeled learning for disease gene identification. PLoS One, 2014, 9(5) e97079
[http://dx.doi.org/10.1371/journal.pone.0097079] [PMID: 24816822]
[33]
Li, F.; Zhang, Y.; Purcell, A.W.; Webb, G.I.; Chou, K.C.; Lithgow, T.; Li, C.; Song, J. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics, 2019, 20(1), 112.
[http://dx.doi.org/10.1186/s12859-019-2700-1] [PMID: 30841845]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 21
ISSUE: 3
Year: 2020
Page: [204 - 211]
Pages: 8
DOI: 10.2174/1389202921666200511072327
Price: $65

Article Metrics

PDF: 19
HTML: 1