Background: Tumor classification is one of the most important applications of
gene expression data. Due to high dimensionality in microarray data, dimensionality
reduction plays a crucial role in tumor classification based on gene expression profiles.
Objective: The primary objective of this study is to increase the accuracy of tumor
classification by reducing the dimensionality of gene expression data with feature
Method: In this paper, we propose a novel supervised feature extraction method for tumor
classification called discriminant hybrid structure preserving projections. The proposed
method utilizes hybrid representation to efficiently characterize the structure of gene
expression data, where both neighbor representation and sparse representation are taken
into account. Specifically, our algorithm enhances the data separability after dimensionality reduction by
simultaneously minimizing the within-class distance and maximizing the between-class distance. Moreover, it
employs an imbalanced adjustment factor during the extraction process to overcome the class imbalance
problem in tumor datasets.
Results: Experiments on five publicly available tumor datasets demonstrate the effectiveness of the proposed
method in comparison with a number of state-of-the-art feature extraction and feature selection methods.
Conclusion: The proposed algorithm can enhance the separability of data after projections and thus improve
the tumor classification accuracy of gene expression data.