Generic placeholder image

Combinatorial Chemistry & High Throughput Screening


ISSN (Print): 1386-2073
ISSN (Online): 1875-5402

Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm / k-nearest Neighbor Method

Author(s): Leping Li, Thomas A. Darden, Clarice R. Weingberg, A. J. Levine and Lee G. Pedersen

Volume 4, Issue 8, 2001

Page: [727 - 739] Pages: 13

DOI: 10.2174/1386207013330733

Price: $65


Recent tools that analyze microarray expression data have exploited correlation-based approaches such as clustering analysis. We describe a new method for assessing the importance of genes for sample classification based on expression data. Our approach combines a genetic algorithm (GA) and the k-nearest neighbor (KNN) method to identify genes that jointly can discriminate between two types of samples (e.g. normal vs. tumor). First, many such subsets of differentially expressed genes are obtained independently using the GA. Then, the overall frequency with which genes were selected is used to deduce the relative importance of genes for sample classification. Sample heterogeneity is accommodated; that is, the method should be robust against the existence of distinct subtypes. We applied GA / KNN to expression data from normal versus tumor tissue from human colon. Two distinct clusters were observed when the 50 most frequently selected genes were used to classify all of the samples in the data sets stu died and the majority of samples were classified correctly. Identification of a set of differentially expressed genes could aid in tumor diagnosis and could also serve to identify disease subtypes that may benefit from distinct clinical approaches to treatment.

Keywords: Gene Expression, Algorithm (GA), K-nearest neighbor (KNN), Pattern recognition, Gene selection, High-dimensional, Microarray

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy