Controlling Feature Selection in Random Forests of Decision Trees Using a Genetic Algorithm: Classification of Class I MHC Peptides

Author(s): Loren Hansen, Ernestine A. Lee, Kevin Hestir, Lewis T. Williams, David Farrelly.

Journal Name: Combinatorial Chemistry & High Throughput Screening
Accelerated Technologies for Biotechnology, Bioassays, Medicinal Chemistry and Natural Products Research

Volume 12 , Issue 5 , 2009

Become EABM
Become Reviewer


Feature selection is an important challenge in many classification problems, especially if the number of features greatly exceeds the number of examples available. We have developed a procedure - GenForest - which controls feature selection in random forests of decision trees by using a genetic algorithm. This approach was tested through our entry into the Comparative Evaluation of Prediction Algorithms 2006 (CoEPrA) competition (accessible online at: CoEPrA was a modeling competition organized to provide an objective testing for various classification and regression algorithms via the process of blind prediction. In the competition GenForest ranked 10/23, 5/16 and 9/16 on CoEPrA classification problems 1, 3 and 4, respectively, which involved the classification of type I MHC nonapeptides i.e. peptides containing nine amino acids. These problems each involved the classification of different sets of nonapeptides. Associated with each amino acid was a set of 643 features for a total of 5787 features per peptide. The method, its application to the CoEPrA datasets, and its performance in the competition are described.

Keywords: Decision trees, random forests, feature selection, genetic algorithms, evolutionary computation

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2009
Page: [514 - 519]
Pages: 6
DOI: 10.2174/138620709788488984
Price: $65

Article Metrics

PDF: 5