Controlling Feature Selection in Random Forests of Decision Trees Using a Genetic Algorithm: Classification of Class I MHC Peptides

Author(s): Loren Hansen, Ernestine A. Lee, Kevin Hestir, Lewis T. Williams, David Farrelly.

Journal Name: Combinatorial Chemistry & High Throughput Screening
Accelerated Technologies for Biotechnology, Bioassays, Medicinal Chemistry and Natural Products Research

Volume 12 , Issue 5 , 2009

Become EABM
Become Reviewer

Abstract:

Feature selection is an important challenge in many classification problems, especially if the number of features greatly exceeds the number of examples available. We have developed a procedure - GenForest - which controls feature selection in random forests of decision trees by using a genetic algorithm. This approach was tested through our entry into the Comparative Evaluation of Prediction Algorithms 2006 (CoEPrA) competition (accessible online at: http://www.coepra.org). CoEPrA was a modeling competition organized to provide an objective testing for various classification and regression algorithms via the process of blind prediction. In the competition GenForest ranked 10/23, 5/16 and 9/16 on CoEPrA classification problems 1, 3 and 4, respectively, which involved the classification of type I MHC nonapeptides i.e. peptides containing nine amino acids. These problems each involved the classification of different sets of nonapeptides. Associated with each amino acid was a set of 643 features for a total of 5787 features per peptide. The method, its application to the CoEPrA datasets, and its performance in the competition are described.

Keywords: Decision trees, random forests, feature selection, genetic algorithms, evolutionary computation

Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 12
ISSUE: 5
Year: 2009
Page: [514 - 519]
Pages: 6
DOI: 10.2174/138620709788488984
Price: $65

Article Metrics

PDF: 5