Background: Microarray gene expression datasets contain huge volume of gene data to be
used for cancer analysis but often suffer from “curse of dimensionality” and “missing values”. They
prevent analysts from extracting right knowledge and often results in instable results.
Objective: To address both these issues, the paper proposes a novel algorithm based on Genetic
Method: GA is commonly used for feature selection and treating missing values in microarray datasets.
But, it often results in premature convergence due to insufficient exploration and exploitation. In the
proposed Adaptive Genetic Algorithm (AGA), genetic parameters are dynamically determined based on
the values in current generation in order to improve optimality of the solution. The population is divided
into two sub-populations and crossover and mutation are performed in parallel on these sub-populations
in order to speed up the execution and also to have modularity in the population for performing these
operations. In this paper, the missing values are first imputed using AGA and again AGA is used to
select significant features.
Results: The proposed methodology is implemented in different real microarray datasets to impute
values at different missing proportions and to select prominent features. It is found that the datasets
processed with AGA provides better results than the standard methods.
Conclusion: AGA can be implemented successfully in all datasets where the number of features is large
and missing values are present. AGA preprocesses the datasets and prepares them for better