Background: The accurate classification of microarray data has been a great challenge in
machine learning due to its high dimensionality and small number of samples. Feature selection is an
effective way to deal with such data.
Objective: Feature subset that maximizes feature-feature diversity as well as feature-class relevance is
selected to improve the predictive efficiency and reduce the cost of feature acquisition. Moreover, the
selection of features with high entropy but low classification performance is restricted.
Method: We first present a feature selection criterion based on information distance measure by introducing
the self-redundancy factor into the maximum relevance and maximum redundancy criterion,
where the self-redundancy factor is taken as the penalty for feature with high entropy; then, an incremental
search based feature selection method using this criterion called MFFID is proposed to maximize
the information distance between features.
Results: Compared with four representative feature selection methods on twelve high-dimensional microarray
datasets, the proposed method MFFID achieves better performance than the other methods in
terms of the classification accuracy.
Conclusion: In this study, a novel feature selection method MFFID is proposed, which is expressed in
the form of information distance measure by introducing the self-redundancy factor into CMRMR. The
experimental results clearly demonstrate that MFFID is an effective and stable feature selection
method for the tumor datasets classification.