Identification of Cancer Biomarkers in Human Body Fluids by Using Enhanced Physicochemical-incorporated Evolutionary Conservation Scheme

Author(s): Jian Zhang*, Yu Zhang, Yanlin Li, Song Guo, Guifu Yang

Journal Name: Current Topics in Medicinal Chemistry

Volume 20 , Issue 21 , 2020


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Objective: Cancer is one of the most serious diseases affecting human health. Among all current cancer treatments, early diagnosis and control significantly help increase the chances of cure. Detecting cancer biomarkers in body fluids now is attracting more attention within oncologists. In-silico predictions of body fluid-related proteins, which can be served as cancer biomarkers, open a door for labor-intensive and time-consuming biochemical experiments.

Methods: In this work, we propose a novel method for high-throughput identification of cancer biomarkers in human body fluids. We incorporate physicochemical properties into the weighted observed percentages (WOP) and position-specific scoring matrices (PSSM) profiles to enhance their attributes that reflect the evolutionary conservation of the body fluid-related proteins. The least absolute selection and shrinkage operator (LASSO) feature selection strategy is introduced to generate the optimal feature subset.

Results: The ten-fold cross-validation results on training datasets demonstrate the accuracy of the proposed model. We also test our proposed method on independent testing datasets and apply it to the identification of potential cancer biomarkers in human body fluids.

Conclusion: The testing results promise a good generalization capability of our approach.

Keywords: Cancer biomarkers, Body fluid, Evolutionary conservation, Physicochemical properties, LASSO, PSSM.

[1]
Fitzmaurice, C.; Allen, C.; Barber, R.M.; Barregard, L.; Bhutta, Z.A.; Brenner, H.; Dicker, D.J.; Chimed-Orchir, O.; Dandona, R.; Dandona, L. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 32 cancer groups, 1990 to 2015: a systematic analysis for the global burden of disease study. JAMA Oncol., 2017, 3(4), 524.
[http://dx.doi.org/10.1001/jamaoncol.2016.5688] [PMID: 27918777]
[2]
Demicheli, R.; Coradini, D. Gene regulatory networks: a new conceptual framework to analyse breast cancer behaviour. Ann. Oncol., 2011, 22(6), 1259-1265.
[http://dx.doi.org/10.1093/annonc/mdq546] [PMID: 21109571]
[3]
Welsh, J.B.; Sapinoso, L.M.; Kern, S.G.; Brown, D.A.; Liu, T.; Bauskin, A.R.; Ward, R.L.; Hawkins, N.J.; Quinn, D.I.; Russell, P.J.; Sutherland, R.L.; Breit, S.N.; Moskaluk, C.A.; Frierson, H.F., Jr; Hampton, G.M. Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum. Proc. Natl. Acad. Sci. USA, 2003, 100(6), 3410-3415.
[http://dx.doi.org/10.1073/pnas.0530278100] [PMID: 12624183]
[4]
Urruticoechea, A.; Alemany, R.; Balart, J.; Villanueva, A.; Viñals, F.; Capellá, G. Recent advances in cancer therapy: an overview. Curr. Pharm. Des., 2010, 16(1), 3-10.
[http://dx.doi.org/10.2174/138161210789941847] [PMID: 20214614]
[5]
Ahn, S.M.; Simpson, R.J. Body fluid proteomics: Prospects for biomarker discovery. Proteomics Clin. Appl., 2007, 1(9), 1004-1015.
[http://dx.doi.org/10.1002/prca.200700217] [PMID: 21136753]
[6]
Hanash, S.M.; Pitteri, S.J.; Faca, V.M. Mining the plasma proteome for cancer biomarkers. Nature, 2008, 452(7187), 571-579.
[http://dx.doi.org/10.1038/nature06916] [PMID: 18385731]
[7]
Ojima, T.; Iwahashi, M.; Nakamura, M.; Matsuda, K.; Nakamori, M.; Ueda, K.; Naka, T.; Ishida, K.; Primus, F.J.; Yamaue, H. Successful cancer vaccine therapy for carcinoembryonic antigen (CEA)-expressing colon cancer using genetically modified dendritic cells that express CEA and T helper-type 1 cytokines in CEA transgenic mice. Int. J. Cancer, 2007, 120(3), 585-593.
[http://dx.doi.org/10.1002/ijc.22298] [PMID: 17096339]
[8]
Xin, Z. Combined detection of TRIM29 and PSA for prostate cancer diagnosis. J. Mod. Oncol., 2015, 2, 235-238.
[9]
Liu, H.; Xu, Y.; Xiang, J.; Long, L.; Green, S.; Yang, Z.; Zimdahl, B.; Lu, J.; Cheng, N.; Horan, L.H.; Liu, B.; Yan, S.; Wang, P.; Diaz, J.; Jin, L.; Nakano, Y.; Morales, J.F.; Zhang, P.; Liu, L.X.; Staley, B.K.; Priceman, S.J.; Brown, C.E.; Forman, S.J.; Chan, V.W.; Liu, C. Targeting alpha-fetoprotein (afp)-mhc complex with car t-cell therapy for liver cancer. Clin. Cancer Res., 2017, 23(2), 478-488.
[http://dx.doi.org/10.1158/1078-0432.CCR-16-1203] [PMID: 27535982]
[10]
Choe, J.W.; Kim, H.J.; Kim, J.S.; Cha, J.; Joo, M.K.; Lee, B.J.; Park, J.J.; Bak, Y.T. Usefulness of CA 19-9 for pancreatic cancer screening in patients with new-onset diabetes. HBPD INT, 2018, 17(3), 263-268.
[http://dx.doi.org/10.1016/j.hbpd.2018.04.001] [PMID: 29752133]
[11]
Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods, 2019, 166, 91-102.
[http://dx.doi.org/10.1016/j.ymeth.2019.02.009] [PMID: 30772464]
[12]
Zou, Q.; He, W. Special Protein Molecules Computational Identification. Int. J. Mol. Sci., 2018, 19(2), 536.
[http://dx.doi.org/10.3390/ijms19020536]]
[13]
Vilar, S.; González-Díaz, H.; Santana, L.; Uriarte, E. QSAR model for alignment-free prediction of human breast cancer biomarkers based on electrostatic potentials of protein pseudofolding HP-lattice networks. J. Comput. Chem., 2008, 29(16), 2613-2622.
[http://dx.doi.org/10.1002/jcc.21016] [PMID: 18478581]
[14]
Wang, J.; Liang, Y.; Wang, Y.; Cui, J.; Liu, M.; Du, W.; Xu, Y. Computational prediction of human salivary proteins from blood circulation and application to diagnostic biomarker identification. PLoS One, 2013, 8(11)e80211
[http://dx.doi.org/10.1371/journal.pone.0080211] [PMID: 24324552]
[15]
Munteanu, C.R.; Pedreira, N.; Dorado, J.; Pazos, A.; Pérez-Montoto, L.G.; Ubeira, F.M.; González-Díaz, H. LECTINPred: web server that uses complex networks of protein structure for prediction of lectins with potential use as cancer biomarkers or in parasite vaccine design. Mol. Inform., 2014, 33(4), 276-285.
[http://dx.doi.org/10.1002/minf.201300027] [PMID: 27485774]
[16]
Sun, Y.; Du, W.; Zhou, C.; Zhou, Y.; Cao, Z.; Tian, Y.; Wang, Y. A computational method for prediction of saliva-secretory proteins and its application to identification of head and neck cancer biomarkers for salivary diagnosis. IEEE Trans. Nanobioscience, 2015, 14(2), 167-174.
[http://dx.doi.org/10.1109/TNB.2015.2395143] [PMID: 25675464]
[17]
Zhang, J.; Zhang, Y.; Ma, Z. In-silico prediction of human secretory proteins in plasma based on discrete firefly optimization and application to cancer biomarkers identification. Front. Genet., 2019, 10, 542.
[http://dx.doi.org/10.3389/fgene.2019.00542] [PMID: 31244885]
[18]
Sikosek, T.; Chan, H.S. Biophysics of protein evolution and evolutionary protein biophysics. J. R. Soc. Interface, 2014, 11(100)20140419
[http://dx.doi.org/10.1098/rsif.2014.0419] [PMID: 25165599]
[19]
Guo, H.H.; Choe, J.; Loeb, L.A. Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA, 2004, 101(25), 9205-9210.
[http://dx.doi.org/10.1073/pnas.0403255101] [PMID: 15197260]
[20]
Bastolla, U.; Dehouck, Y.; Echave, J. What evolution tells us about protein physics, and protein physics tells us about evolution. Curr. Opin. Struct. Biol., 2017, 42, 59-66.
[http://dx.doi.org/10.1016/j.sbi.2016.10.020] [PMID: 27865208]
[21]
Chai, H.; Zhang, J.; Yang, G.; Ma, Z. An evolution-based DNA-binding residue predictor using a dynamic query-driven learning scheme. Mol. Biosyst., 2016, 12(12), 3643-3650.
[http://dx.doi.org/10.1039/C6MB00626D] [PMID: 27730230]
[22]
Zhang, J.; Chai, H.; Yang, G.; Ma, Z. Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme. BMC Bioinformatics, 2017, 18(1), 294.
[http://dx.doi.org/10.1186/s12859-017-1709-6] [PMID: 28583090]
[23]
Zhao, B.; Wang, J.; Li, X.; Wu, F-X. Essential protein discovery based on a combination of modularity and conservatism. Methods, 2016, 110, 54-63.
[http://dx.doi.org/10.1016/j.ymeth.2016.07.005] [PMID: 27402354]
[24]
Zhang, J.; Chai, H.; Gao, B.; Yang, G.; Ma, Z. HEMEsPred: Structure-based ligand-specific heme binding residues prediction by using fast-adaptive ensemble learning scheme. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 15(1), 147-156.
[http://dx.doi.org/10.1109/TCBB.2016.2615010] [PMID: 28029626]
[25]
Ni, P.; Wang, J.; Zhong, P.; Li, Y.; Wu, F.; Pan, Y. Constructing disease similarity networks based on disease module theory. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2018, 17(3), 906-915.
[PMID: 29993782]
[26]
Fonti, V.; Belitser, E. Feature selection using lasso. VU Amsterdam Research Paper in Business Analytics, 2017.
[27]
Zhang, J.; Sun, P.; Zhao, X.; Ma, Z. PECM: prediction of extracellular matrix proteins using the concept of Chou’s pseudo amino acid composition. J. Theor. Biol., 2014, 363, 412-418.
[http://dx.doi.org/10.1016/j.jtbi.2014.08.002] [PMID: 25123433]
[28]
Kandaswamy, K.K.; Pugalenthi, G.; Kalies, K.U.; Hartmann, E.; Martinetz, T. EcmPred: prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection. J. Theor. Biol., 2013, 317, 377-383.
[http://dx.doi.org/10.1016/j.jtbi.2012.10.015] [PMID: 23123454]
[29]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[30]
Yang, J.; Yan, R.; Roy, A.; Xu, D.; Poisson, J.; Zhang, Y. The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 2015, 12(1), 7-8.
[http://dx.doi.org/10.1038/nmeth.3213] [PMID: 25549265]
[31]
Mannige, R.V.; Haxton, T.K.; Proulx, C.; Robertson, E.J.; Battigelli, A.; Butterfoss, G.L.; Zuckermann, R.N.; Whitelam, S. Peptoid nanosheets exhibit a new secondary-structure motif. Nature, 2015, 526(7573), 415-420.
[http://dx.doi.org/10.1038/nature15363] [PMID: 26444241]
[32]
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics, 2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[33]
Clyde, M.; Müller, P.; Parmigiani, G. Logistic Regression Model; Bayesian Biostatistics, 2018, p. 297.
[34]
Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting diabetes mellitus with machine learning techniques. Front. Genet., 2018, 9(515), 515.
[http://dx.doi.org/10.3389/fgene.2018.00515] [PMID: 30459809]
[35]
Chowdhury, S.; Zhang, J.; Kurgan, L. In silico prediction and validation of novel RNA binding proteins and residues in the human proteome. Proteomics, 2018, 18(21-22)e1800064
[http://dx.doi.org/10.1002/pmic.201800064] [PMID: 29806170]
[36]
Zou, Q. Latest machine learning techniques for biomedicine and bioinformatics. Curr. Bioinform., 2019, 14(3), 176-177.
[http://dx.doi.org/10.2174/157489361403190220112855]
[37]
Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv., 2018, 50(6), 94.
[http://dx.doi.org/10.1145/3136625]
[38]
Qu, K.; Guo, F.; Liu, X.; Lin, Y.; Zou, Q. Application of machine learning in microbiology. Front. Microbiol., 2019, 10, 827.
[http://dx.doi.org/10.3389/fmicb.2019.00827] [PMID: 31057526]
[39]
Wei, L.; Zhou, C.; Su, R.; Zou, Q. PEPred-Suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics, 2019, 35(21), 4272-4280.
[http://dx.doi.org/10.1093/bioinformatics/btz246] [PMID: 30994882]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 20
ISSUE: 21
Year: 2020
Page: [1888 - 1897]
Pages: 10
DOI: 10.2174/1568026620666200710100743
Price: $65

Article Metrics

PDF: 21
HTML: 3