Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to
build a prediction model for glioblastomas.
Background: The morbidity and mortality of glioblastomas are very high, which seriously endanger
human health. At present, the goals of many investigations on gliomas are mainly to understand the
cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and
treatment methods. However, there is no effective early diagnosis method for this disease, and there are
no effective prevention, diagnosis, or treatment measures.
Methods: Firstly, the gene expression profiles derived from GEO were downloaded. Then, differentially
expressed genes (DEGs) in the disease samples and the control samples were identified. After
that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation-
based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the
classification model between the glioblastoma samples and the controls was built by a Support Vector
Machine (SVM) based on selected key genes.
Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were
selected as the feature genes to build the classification model between the glioma samples and the control
samples by the CFS method. The accuracy of the classification model by using a 10-fold crossvalidation
test and the independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B
and CYBB can also be found in the top 5 hub genes screened by the protein-protein interaction (PPI)
Conclusion: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas.
In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers
for the diagnosis of glioblastomas.