Background: Microarray gene expression datasets usually contain a large number of genes
that complicate further operations like classification, clustering and other kinds of analysis. During the
classification process, the identification of salient genes is a brainstorming task and needs a careful selection.
Methods: The classification of multi-class datasets is more critical when compared with binary classification.
When there are multiple class labels, chances are more likely that the datasets are imbalanced.
Large variations can be seen in the number of samples belonging to each class, and hence the classification
process may go biased with incorrect samples chosen for training. There is no sufficient research
work available to address all these three scenarios together in microarray datasets.
Results and Discussion: The paper fills this gap with the following contributions: i) Selects salient
genes for classification using multiSURF algorithm ii) Identifies right instances from imbalanced datasets
using Retained Tomek Link algorithm and iii) Performs gene selection for multi-class classification
using Dynamic Length Particle Swarm Optimization (DPSO).
Conclusion: The proposed method is implemented on multi-class imbalanced microarray datasets, and
the final classification performance is seen to be encouraging and better than other compared methods.