Aims: Microarray data is widely used in disease analysis and diagnosis. However, these data could contain thousands of genes and a small number of samples and some existing models cannot capture the patterns on these datasets accurately without utilizing feature selection method.
Background:Feature selection is an important stage in data preprocessing. Given the limitations of employing filter or wrapper approaches individually for feature selection, it is promising to combine filter and wrapper into a hybrid algorithm by utilizing their respective advantages to search optimal feature subsets.
Objective:The primary objective of this study is to design a good feature selection strategy for high-dimensional biomedical datasets.
Method:A novel hybrid filter-wrapper approach is proposed for high dimensional datasets. First, the Chi-square Test is utilized to filter out most of the irrelevant or redundant features. Next, an improved binary Fruit Fly Optimization algorithm is used to further search optimal feature subset without degrading the classification accuracy. The KNN classifier with the 10-fold-CV is utilized to evaluate the accuracy of classification.
Result:Experimental results show that CS-IFOA can use a smaller number of features while achieving higher classification accuracy. Furthermore, the standard deviation of the calculation results is relatively small, indicating that the repeated 10-fold-CV is reliable and the proposed algorithm is relatively robust.
Conclusion:Proposed strategy can be used as an ideal pre-processing tool to help optimize the feature selection process of high-dimensional biomedical data sets, which further indicate integrating filter method into wrapper model can enhance the performance of feature subset selection.
Other:For future work, proposed strategy will be applied to many other biological datasets, and other classifiers can also be combined with this strategy to verify and extend this approach. The findings of our study could open a basis for further research for hybrid feature selects approaches.