Prediction and Analysis of Hub Genes in Renal Cell Carcinoma based on CFS Gene Selection Method Combined with Adaboost Algorithm

Author(s): Yina Wang, Benrong Zheng, Manbin Xu, Shaoping Cai, Jeong Younseo, Chi Zhang*, Boxiong Jiang*

Journal Name: Medicinal Chemistry

Volume 16 , Issue 5 , 2020


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Background: Renal cell carcinoma (RCC) is the most common malignant tumor of the adult kidney.

Objective: The aim of this study was to identify key genes signatures during RCC and uncover their potential mechanisms.

Methods: Firstly, the gene expression profiles of GSE53757 which contained 144 samples, including 72 kidney cancer samples and 72 controls, were downloaded from the GEO database. And then differentially expressed genes (DEGs) between the kidney cancer samples and the controls were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-based feature subset (CFS) method was applied to the selection of key genes of DEGs. In addition, the classification model between the kidney cancer samples and the controls was built by Adaboost based on the selected key genes.

Results: 213 DEGs including 80 up-regulated and 133 down-regulated genes were selected as the feature genes to build the classification model between the kidney cancer samples and the controls by CFS method. The accuracy of the classification model by using 5-folds cross-validation test and independent set test is 84.4% and 83.3%, respectively. Besides, TYROBP, CD4163, CAV1, CXCL9, CXCL11 and CXCL13 also can be found in the top 20 hub genes screened by proteinprotein interaction (PPI) network.

Conclusion: It indicated that CFS is a useful tool to identify key genes in kidney cancer. Besides, we also predicted genes such as TYROBP, CD4163, CAV1, CXCL9, CXCL11 and CXCL13 that might target genes to diagnose the kidney cancer.

Keywords: Gene expression profiles, gene selection, renal cell carcinoma, correlation-based feature subset (CFS), Adaboost, gene ontology.

[1]
Ouzaid, I. Kidney cancer. Prog. Urol., 2017, 27(3), F63-F65.
[PMID: 24674328]
[2]
Stewart, B.; Wild, C.P. World cancer report 2014. WORLD, 2016.
[3]
International Agency for Research on Cancer Globocan. 2012.
[4]
Niu, B.; Li, J.; Li, G.; Poon, S.; Harrington, P.B. Analysis and Modeling for Big Data in Cancer Research. BioMed Res. Int., 2017, 2017, 1972097
[http://dx.doi.org/10.1155/2017/1972097] [PMID: 28691016]
[5]
Niu, B.; Zhao, M.; Su, Q.; Zhang, M.; Lv, W.; Chen, Q.; Chen, F.; Chu, D.; Du, D.; Zhang, Y. 2D-SAR and 3D-QSAR analyses for acetylcholinesterase inhibitors. Mol. Divers., 2017, 21(2), 413-426.
[http://dx.doi.org/10.1007/s11030-017-9732-0] [PMID: 28275924]
[6]
Zhao, M.; Wang, L.; Zheng, L.; Zhang, M.; Qiu, C.; Zhang, Y.; Du, D.; Niu, B. 2D-QSAR and 3D-QSAR Analyses for EGFR Inhibitors. BioMed Res. Int., 2017, 2017, 4649191
[http://dx.doi.org/10.1155/2017/4649191] [PMID: 28630865]
[7]
Cornella, H.; Alsinet, C.; Sayols, S.; Zhang, Z.; Hao, K.; Cabellos, L.; Hoshida, Y.; Villanueva, A.; Thung, S.; Ward, S.C.; Rodriguez-Carunchio, L.; Vila-Casadesús, M.; Imbeaud, S.; Lachenmayer, A.; Quaglia, A.; Nagorney, D.M.; Minguez, B.; Carrilho, F.; Roberts, L.R.; Waxman, S.; Mazzaferro, V.; Schwartz, M.; Esteller, M.; Heaton, N.D.; Zucman-Rossi, J.; Llovet, J.M. Unique genomic profile of fibrolamellar hepatocellular carcinoma. Gastroenterology, 2015, 148(4), 806-18.e10.
[http://dx.doi.org/10.1053/j.gastro.2014.12.028] [PMID: 25557953]
[8]
D’Souza, M.; Zhu, X.; Frisina, R.D. Novel approach to select genes from RMA normalized microarray data using functional hearing tests in aging mice. J. Neurosci. Methods, 2008, 171(2), 279-287.
[http://dx.doi.org/10.1016/j.jneumeth.2008.02.022] [PMID: 18455804]
[9]
Kohl, M.; Deigner, H-P. Preprocessing of gene expression data by optimally robust estimators. BMC Bioinformatics, 2010, 11, 583.
[http://dx.doi.org/10.1186/1471-2105-11-583] [PMID: 21118506]
[10]
Irizarry, R.A.; Hobbs, B.; Collin, F.; Beazer-Barclay, Y.D.; Antonellis, K.J.; Scherf, U.; Speed, T.P. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 2003, 4(2), 249-264.
[http://dx.doi.org/10.1093/biostatistics/4.2.249] [PMID: 12925520]
[11]
Ritchie, M.E.; Silver, J.; Oshlack, A.; Holmes, M.; Diyagama, D.; Holloway, A.; Smyth, G.K. A comparison of background correction methods for two-colour microarrays. Bioinformatics, 2007, 23(20), 2700-2707.
[http://dx.doi.org/10.1093/bioinformatics/btm412] [PMID: 17720982]
[12]
Harris, M.A.; Clark, J.; Ireland, A.; Lomax, J.; Ashburner, M.; Foulger, R.; Eilbeck, K.; Lewis, S.; Marshall, B.; Mungall, C.; Richter, J.; Rubin, G.M.; Blake, J.A.; Bult, C.; Dolan, M.; Drabkin, H.; Eppig, J.T.; Hill, D.P.; Ni, L.; Ringwald, M.; Balakrishnan, R.; Cherry, J.M.; Christie, K.R.; Costanzo, M.C.; Dwight, S.S.; Engel, S.; Fisk, D.G.; Hirschman, J.E.; Hong, E.L.; Nash, R.S.; Sethuraman, A.; Theesfeld, C.L.; Botstein, D.; Dolinski, K.; Feierbach, B.; Berardini, T.; Mundodi, S.; Rhee, S.Y.; Apweiler, R.; Barrell, D.; Camon, E.; Dimmer, E.; Lee, V.; Chisholm, R.; Gaudet, P.; Kibbe, W.; Kishore, R.; Schwarz, E.M.; Sternberg, P.; Gwinn, M.; Hannick, L.; Wortman, J.; Berriman, M.; Wood, V.; de la Cruz, N.; Tonellato, P.; Jaiswal, P.; Seigfried, T.; White, R. Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res., 2004, 32(Database issue), D258-D261.
[PMID: 14681407]
[13]
Blake, J.A.; Dolan, M.; Drabkin, H.; Hill, D.P.; Li, N.; Sitnikov, D.; Bridges, S.; Burgess, S.; Buza, T.; McCarthy, F.; Peddinti, D.; Pillai, L.; Carbon, S.; Dietze, H.; Ireland, A.; Lewis, S.E.; Mungall, C.J.; Gaudet, P.; Chrisholm, R.L.; Fey, P.; Kibbe, W.A.; Basu, S.; Siegele, D.A.; McIntosh, B.K.; Renfro, D.P.; Zweifel, A.E.; Hu, J.C.; Brown, N.H.; Tweedie, S.; Alam-Faruque, Y.; Apweiler, R.; Auchinchloss, A.; Axelsen, K.; Bely, B.; Blatter, M. -; Bonilla, C.; Bouguerleret, L.; Boutet, E.; Breuza, L.; Bridge, A.; Chan, W.M.; Chavali, G.; Coudert, E.; Dimmer, E.; Estreicher, A.; Famiglietti, L.; Feuermann, M.; Gos, A.; Gruaz-Gumowski, N.; Hieta, R.; Hinz, C.; Hulo, C.; Huntley, R.; James, J.; Jungo, F.; Keller, G.; Laiho, K.; Legge, D.; Lemercier, P.; Lieberherr, D.; Magrane, M.; Martin, M.J.; Masson, P.; Mutowo-Muellenet, P.; O’Donovan, C.; Pedruzzi, I.; Pichler, K.; Poggioli, D.; Porras Millán, P.; Poux, S.; Rivoire, C.; Roechert, B.; Sawford, T.; Schneider, M.; Stutz, A.; Sundaram, S.; Tognolli, M.; Xenarios, I.; Foulgar, R.; Lomax, J.; Roncaglia, P.; Khodiyar, V.K.; Lovering, R.C.; Talmud, P.J.; Chibucos, M.; Giglio, M.G.; Chang, H-; Hunter, S.; McAnulla, C.; Mitchell, A.; Sangrador, A.; Stephan, R.; Harris, M.A.; Oliver, S.G.; Rutherford, K.; Wood, V.; Bahler, J.; Lock, A.; Kersey, P.J.; McDowall, D.M.; Staines, D.M.; Dwinell, M.; Shimoyama, M.; Laulederkind, S.; Hayman, T.; Wang, S-; Petri, V.; Lowry, T.; D’Eustachio, P.; Matthews, L.; Balakrishnan, R.; Binkley, G.; Cherry, J.M.; Costanzo, M.C.; Dwight, S.S.; Engel, S.R.; Fisk, D.G.; Hitz, B.C.; Hong, E.L.; Karra, K.; Miyasato, S.R.; Nash, R.S.; Park, J.; Skrzypek, M.S.; Weng, S.; Wong, E.D.; Berardini, T.Z.; Huala, E.; Mi, H.; Thomas, P.D.; Chan, J.; Kishore, R.; Sternberg, P.; Van Auken, K.; Howe, D.; Westerfield, M. Gene Ontology Consortium. Gene Ontology annotations and resources. Nucleic Acids Res., 2013, 41(Database issue), D530-D535.
[PMID: 23161678]
[14]
Kanehisa, M.; Araki, M.; Goto, S.; Hattori, M.; Hirakawa, M.; Itoh, M.; Katayama, T.; Kawashima, S.; Okuda, S.; Tokimatsu, T.; Yamanishi, Y. KEGG for linking genomes to life and the environment. Nucleic Acids Res., 2008, 36(Database issue), D480-D484.
[PMID: 18077471]
[15]
Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 2000, 28(1), 27-30.
[16]
Ogata, H. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 1999, 27(1), 29-34.
[17]
Huang, W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc., 2009, 4(1), 44-57.
[http://dx.doi.org/10.1038/nprot.2008.211] [PMID: 19131956]
[18]
Huang, W.; Sherman, B.T.; Lempicki, R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res., 2009, 37(1), 1-13.
[http://dx.doi.org/10.1093/nar/gkn923] [PMID: 19033363]
[19]
Huang, D.W.; Sherman, B.T.; Tan, Q.; Kir, J.; Liu, D.; Bryant, D.; Guo, Y.; Stephens, R.; Baseler, M.W.; Lane, H.C.; Lempicki, R.A. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res., 2007,, 35(Web Server issue), W169-75.
[http://dx.doi.org/10.1093/nar/gkm415] [PMID: 17576678]
[20]
Szklarczyk, D.; Franceschini, A.; Kuhn, M.; Simonovic, M.; Roth, A.; Minguez, P.; Doerks, T.; Stark, M.; Muller, J.; Bork, P.; Jensen, L.J.; von Mering, C. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res., 2011, 39(Database issue), D561-D568.
[http://dx.doi.org/10.1093/nar/gkq973] [PMID: 21045058]
[21]
Szklarczyk, D.; Franceschini, A.; Wyder, S.; Forslund, K.; Heller, D.; Huerta-Cepas, J.; Simonovic, M.; Roth, A.; Santos, A.; Tsafou, K.P.; Kuhn, M.; Bork, P.; Jensen, L.J.; von Mering, C. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res., 2015, 43(Database issue), D447-D452.
[http://dx.doi.org/10.1093/nar/gku1003] [PMID: 25352553]
[22]
Hall, M.A. Correlation-based feature selection for machine learning; The University of Waikato, 1999.
[23]
Hu, Y. Identify compounds’s target against Alzheimer’s disease based on silico approach. Curr. Alzheimer Res., 2019, 16(3), 193-208.
[http://dx.doi.org/10.2174/1567205016666190103154855]
[24]
Niu, B.; Yuan, X.C.; Roeper, P.; Su, Q.; Peng, C.R.; Yin, J.Y.; Ding, J.; Li, H.; Lu, W.C. HIV-1 protease cleavage site prediction based on two-stage feature selection method. Protein Pept. Lett., 2013, 20(3), 290-298.
[PMID: 22591479]
[25]
Hu, Y.; Lu, Y.; Wang, S.; Zhang, M.; Qu, X.; Niu, B. Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs. Curr. Drug Targets, 2019, 20(5), 488-500.
[http://dx.doi.org/10.2174/1389450119666180809122244] [PMID: 30091413]
[26]
Niu, B.; Lu, Y.; Wang, J.; Hu, Y.; Chen, J.; Chen, Q.; He, G.; Zheng, L. 2D-SAR, Topomer CoMFA and molecular docking studies on avian influenza neuraminidase inhibitors. Comput. Struct. Biotechnol. J., 2018, 17, 39-48.
[http://dx.doi.org/10.1016/j.csbj.2018.11.007] [PMID: 30595814]
[27]
Schapire, R.E. The boosting approach to machine learning: An overview; Springer: New York, 2003.
[28]
Zhang, P.B.; Yang, Z.X. A Novel AdaBoost framework with robust threshold and structural optimization. IEEE Trans. Cybern., 2018, 48(1), 64-76.
[http://dx.doi.org/10.1109/TCYB.2016.2623900] [PMID: 27898387]
[29]
Niu, B.; Cai, Y.D.; Lu, W.C.; Li, G.Z.; Chou, K.C. Predicting protein structural class with AdaBoost Learner. Protein Pept. Lett., 2006, 13(5), 489-492.
[http://dx.doi.org/10.2174/092986606776819619] [PMID: 16800803]
[30]
Niu, B. Predicting toxic action mechanisms of phenols using AdaBoost Learner. Chemom. Intell. Lab. Syst., 2009, 96(1), 43-48.
[http://dx.doi.org/10.1016/j.chemolab.2008.11.003]
[31]
Niu, B.; Jin, Y.; Lu, L.; Fen, K.; Gu, L.; He, Z.; Lu, W.; Li, Y.; Cai, Y. Prediction of interaction between small molecule and enzyme using AdaBoost. Mol. Divers., 2009, 13(3), 313-320.
[http://dx.doi.org/10.1007/s11030-009-9116-1] [PMID: 19219560]
[32]
Niu, B. A two-stage method for O-glycosylation site prediction. Chemom. Intell. Lab. Syst., 2011, 108(2), 142-145.
[http://dx.doi.org/10.1016/j.chemolab.2011.06.007]
[33]
Peng, C-R.; Lu, W.C.; Niu, B.; Li, M.J.; Yang, X.Y.; Wu, M.L. Predicting the metabolic pathways of small molecules based on their physicochemical properties. Protein Pept. Lett., 2012, 19(12), 1250-1256.
[http://dx.doi.org/10.2174/092986612803521585] [PMID: 22670666]
[34]
Zhang, M.; Su, Q.; Lu, Y.; Zhao, M.; Niu, B. Application of machine learning approaches for protein-protein interactions prediction. Med. Chem., 2017, 13(6), 506-514.
[http://dx.doi.org/10.2174/1573406413666170522150940] [PMID: 28530547]
[35]
Niu, B. Small molecules’ multi-metabolic pathways prediction using physico-chemical features and multi-task learning method. Curr. Bioinform., 2013, 8(5), 564-568.
[http://dx.doi.org/10.2174/1574893611308050007]
[36]
Niu, B.; Zhang, Y.; Ding, J.; Lu, Y.; Wang, M.; Lu, W.; Yuan, X.; Yin, J. Predicting network of drug-enzyme interaction based on machine learning method. Biochim. Biophys. Acta, 2014, 1844(1 Pt B), 214-223.
[http://dx.doi.org/10.1016/j.bbapap.2013.07.008] [PMID: 23907006]
[37]
Hellman, M.E. The nearest neighbor classification rule with a reject option. Sys. Sci. Cyb. IEEE Trans., 1970, 3, 179-185.
[38]
Su, Q.; Lu, W.; Du, D.; Chen, F.; Niu, B.; Chou, K.C. Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression. Oncotarget, 2017, 8(30), 49359-49369.
[http://dx.doi.org/10.18632/oncotarget.17210] [PMID: 28467816]
[39]
Niu, B.; Zhang, M.; Du, P.; Jiang, L.; Qin, R.; Su, Q.; Chen, F.; Du, D.; Shu, Y.; Chou, K.C. Small molecular floribundiquinone B derived from medicinal plants inhibits acetylcholinesterase activity. Oncotarget, 2017, 8(34), 57149-57162.
[http://dx.doi.org/10.18632/oncotarget.19169] [PMID: 28915661]
[40]
Niu, B.; Jin, Y.H.; Feng, K.Y.; Liu, L.; Lu, W.C.; Cai, Y.D.; Li, G.Z. Predicting membrane protein types with bragging learner. Protein Pept. Lett., 2008, 15(6), 590-594.
[http://dx.doi.org/10.2174/092986608784966921] [PMID: 18680454]
[41]
Niu, B.; Lu, L.; Liu, L.; Gu, T.H.; Feng, K.Y.; Lu, W.C.; Cai, Y.D. HIV-1 protease cleavage site prediction based on amino acid property. J. Comput. Chem., 2009, 30(1), 33-39.
[http://dx.doi.org/10.1002/jcc.21024] [PMID: 18496789]
[42]
Breiman, L. Random forests. Mach. Learn., 2001, 45(1), 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[43]
Breiman, L. Bagging predictors. Mach. Learn., 1996, 24(2), 123-140.
[http://dx.doi.org/10.1007/BF00058655]
[44]
Denisko, D.; Hoffman, M.M. Classification and interaction in random forests. Proc. Natl. Acad. Sci. USA, 2018, 115(8), 1690-1692.
[http://dx.doi.org/10.1073/pnas.1800256115] [PMID: 29440440]
[45]
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Articial Intelligence (IJCAI), 1995.
[46]
Refaeilzadeh, P. Cross-validation. Encyclopedia of database systems; Springer, 2009.
[47]
Niu, B.; Xing, Z.; Zhao, M.; Huo, H.; Huang, G.; Chen, F.; Su, Q.; Lu, Y.; Wang, M.; Yang, J.; Chen, L.; Tang, L.; Zheng, L. Study of drug-drug combinations based on molecular descriptors and physicochemical properties. Comb. Chem. High Throughput Screen., 2016, 19(2), 153-160.
[http://dx.doi.org/10.2174/1386207319666151110122931] [PMID: 26552439]
[48]
Gorbachev, A.V.; Kobayashi, H.; Kudo, D.; Tannenbaum, C.S.; Finke, J.H.; Shu, S.; Farber, J.M.; Fairchild, R.L. CXC chemokine ligand 9/monokine induced by IFN-gamma production by tumor cells is critical for T cell-mediated suppression of cutaneous tumors. J. Immunol., 2007, 178(4), 2278-2286.
[http://dx.doi.org/10.4049/jimmunol.178.4.2278] [PMID: 17277133]
[49]
Burns, J.M.; Summers, B.C.; Wang, Y.; Melikian, A.; Berahovich, R.; Miao, Z.; Penfold, M.E.; Sunshine, M.J.; Littman, D.R.; Kuo, C.J.; Wei, K.; McMaster, B.E.; Wright, K.; Howard, M.C.; Schall, T.J. A novel chemokine receptor for SDF-1 and I-TAC involved in cell survival, cell adhesion, and tumor development. J. Exp. Med., 2006, 203(9), 2201-2213.
[http://dx.doi.org/10.1084/jem.20052144] [PMID: 16940167]
[50]
Tannenbaum, C.S.; Tubbs, R.; Armstrong, D.; Finke, J.H.; Bukowski, R.M.; Hamilton, T.A. The CXC chemokines IP-10 and Mig are necessary for IL-12-mediated regression of the mouse RENCA tumor. J. Immunol., 1998, 161(2), 927-932.
[PMID: 9670971]
[51]
Shabo, I.; Olsson, H.; Stål, O.; Svanvik, J. Breast cancer expression of DAP12 is associated with skeletal and liver metastases and poor survival. Clin. Breast Cancer, 2013, 13(5), 371-377.
[http://dx.doi.org/10.1016/j.clbc.2013.05.003] [PMID: 23810293]
[52]
Cao, G.; Yang, G.; Timme, T.L.; Saika, T.; Truong, L.D.; Satoh, T.; Goltsov, A.; Park, S.H.; Men, T.; Kusaka, N.; Tian, W.; Ren, C.; Wang, H.; Kadmon, D.; Cai, W.W.; Chinault, A.C.; Boone, T.B.; Bradley, A.; Thompson, T.C. Disruption of the caveolin-1 gene impairs renal calcium reabsorption and leads to hypercalciuria and urolithiasis. Am. J. Pathol., 2003, 162(4), 1241-1248.
[http://dx.doi.org/10.1016/S0002-9440(10)63920-X] [PMID: 12651616]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 16
ISSUE: 5
Year: 2020
Published on: 07 August, 2020
Page: [654 - 663]
Pages: 10
DOI: 10.2174/1573406415666191004100744
Price: $65

Article Metrics

PDF: 20
HTML: 2