Title:An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data
VOLUME: 21 ISSUE: 9
Author(s):Saeed Ahmed*, Muhammad Kabir, Zakir Ali, Muhammad Arif, Farman Ali and Dong-Jun Yu
Affiliation:School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094
Keywords:Cancer classification, gene expression data, correlation-based feature selection, multi-objective evolutionary
algorithm, redial base function neural network.
Abstract:Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic
mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new
clinical application of microarray data. In DNA microarray technology, gene expression data have a
high dimension with small sample size. Therefore, the development of efficient and robust feature
selection methods is indispensable that identify a small set of genes to achieve better classification
performance.
Materials and Methods: In this study, we developed a hybrid feature selection method that
integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm
(MOEA) approaches which select the highly informative genes. The hybrid model with Redial base
function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression
datasets by employing a 10-fold cross-validation test.
Results: The experimental results are compared with seven conventional-based feature selection and
other methods in the literature, which shows that our approach owned the obvious merits in the
aspect of classification accuracy ratio and some genes selected by extensive comparing with other
methods.
Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for
six out of eleven datasets with a minimal sized predictive gene subset.