Abstract
Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.
Keywords: genetic algorithms, pattern recognition
Combinatorial Chemistry & High Throughput Screening
Title: Machine Learning Based Pattern Recognition Applied to Microarray Data
Volume: 7 Issue: 2
Author(s): Barry K. Lavine, Charles E. Davidson and William S. Rayens
Affiliation:
Keywords: genetic algorithms, pattern recognition
Abstract: Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.
Export Options
About this article
Cite this article as:
Lavine K. Barry, Davidson E. Charles and Rayens S. William, Machine Learning Based Pattern Recognition Applied to Microarray Data, Combinatorial Chemistry & High Throughput Screening 2004; 7 (2) . https://dx.doi.org/10.2174/138620704773120801
DOI https://dx.doi.org/10.2174/138620704773120801 |
Print ISSN 1386-2073 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5402 |
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Targeting Post-Translational Remodeling of Ryanodine Receptor: A New Track for Alzheimer's Disease Therapy?
Current Alzheimer Research Medullary Thyroid Cancer: A Promising Model for Targeted Therapy
Current Molecular Medicine The Role of Epigenetics in Drug Resistance in Cancer
Epigenetic Diagnosis & Therapy (Discontinued) The Peripheral Anionic Site of Acetylcholinesterase: Structure, Functions and Potential Role in Rational Drug Design
Current Pharmaceutical Design Lipid Nanoparticles to Deliver miRNA in Cancer
Current Pharmaceutical Biotechnology The Effects of Cantharidin and Cantharidin Derivates on Tumour Cells
Anti-Cancer Agents in Medicinal Chemistry Resveratrol as a Potential Therapeutic Candidate for the Treatment and Management of Alzheimer's Disease
Current Topics in Medicinal Chemistry Metabolomics: A Revolution for Novel Cancer Marker Identification
Combinatorial Chemistry & High Throughput Screening New Generation of Oncolytic Herpes Virus
Current Cancer Therapy Reviews Novel Agents Aiming at Specific Molecular Targets Increase Chemosensitivity and Overcome Chemoresistance in Hematopoietic Malignancies
Current Pharmaceutical Design Histone Deacetylase Inhibitors: From Chromatin Remodeling to Experimental Cancer Therapeutics
Current Medicinal Chemistry Dendritic Cells in Colorectal Cancer and a Potential for their Use in Therapeutic Approaches
Current Pharmaceutical Design An Overview of the Chemistry and Pharmacological Potentials of Furanones Skeletons
Current Organic Chemistry mTOR Targeted Cancer Chemoprevention by Flavonoids
Current Medicinal Chemistry Discovering New Treatments for Alzheimer's Disease by Repurposing Approved Medications
Current Topics in Medicinal Chemistry Suicide Gene Therapy Mediated by the Herpes Simplex Virus Thymidine Kinase Gene / Ganciclovir System: Fifteen Years of Application
Current Gene Therapy Human T-Cell Leukemia Virus Type 1: Transition from Latent Infection to Pathogenic Progression and Implications for Molecular Therapy
Current Cancer Therapy Reviews Mechanisms of Microcystin-induced Cytotoxicity and Apoptosis
Mini-Reviews in Medicinal Chemistry Irreversible LSD1 Inhibitors: Application of Tranylcypromine and Its Derivatives in Cancer Treatment
Current Topics in Medicinal Chemistry Nucleic Acid Carrier Systems Based on Polyethylenimine Conjugates for the Treatment of Metastatic Tumors
Current Medicinal Chemistry