Background: Data mining algorithms are extensively used to classify the data, in which
prediction of disease using minimal computation time plays a vital role.
Objectives: The aim of this paper is to develop the classification model from reduced features and
Methods: In this paper we proposed four search algorithms for feature selection the first algorithm
is Random Global Optimal (RGO) search algorithm for searching the continuous, global optimal
subset of features from the random population. The second is Global and Local Optimal (GLO)
search algorithm for searching the global and local optimal subset of features from population. The
third one is Random Local Optimal (RLO) search algorithm for generating random, local optimal
subset of features from the random population. Finally the Random Global and Optimal (RGLO)
search algorithm for searching the continuous, global and local optimal subset of features from the
random population. RGLO search algorithm combines the properties of first three stated algorithm.
The subsets of features generated from the proposed four search algorithms are evaluated using the
consistency based subset evaluation measure. Instance based learning algorithm is applied to the
resulting feature dataset to reduce the instances that are redundant or irrelevant for classification.
The model developed using naïve Bayesian classifier from the reduced features and instances is
validated with the tenfold cross validation.
Results: Classification accuracy based on RGLO search algorithm using naïve Bayesian classifier
is 94.82% for Breast, 97.4% for DLBCL, 98.83% for SRBCT and 98.89% for Leukemia datasets.
Conclusion: The RGLO search based reduced features results in the high prediction rate with less
computational time when compared with the complete dataset and other proposed subset generation