Abstract
Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.
Keywords: genetic algorithms, pattern recognition
Combinatorial Chemistry & High Throughput Screening
Title: Machine Learning Based Pattern Recognition Applied to Microarray Data
Volume: 7 Issue: 2
Author(s): Barry K. Lavine, Charles E. Davidson and William S. Rayens
Affiliation:
Keywords: genetic algorithms, pattern recognition
Abstract: Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.
Export Options
About this article
Cite this article as:
Lavine K. Barry, Davidson E. Charles and Rayens S. William, Machine Learning Based Pattern Recognition Applied to Microarray Data, Combinatorial Chemistry & High Throughput Screening 2004; 7 (2) . https://dx.doi.org/10.2174/138620704773120801
DOI https://dx.doi.org/10.2174/138620704773120801 |
Print ISSN 1386-2073 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5402 |
Call for Papers in Thematic Issues
Artificial Intelligence Methods for Biomedical, Biochemical and Bioinformatics Problems
Recently, a large number of technologies based on artificial intelligence have been developed and applied to solve a diverse range of problems in the areas of biomedical, biochemical and bioinformatics problems. By utilizing powerful computing resources and massive amounts of data, methods based on artificial intelligence can significantly improve the ...read more
Emerging trends in diseases mechanisms, noble drug targets and therapeutic strategies: focus on immunological and inflammatory disorders
Recently infectious and inflammatory diseases have been a key concern worldwide due to tremendous morbidity and mortality world Wide. Recent, nCOVID-9 pandemic is a good example for the emerging infectious disease outbreak. The world is facing many emerging and re-emerging diseases out breaks at present however, there is huge lack ...read more
Exploring Spectral Graph Theory in Combinatorial Chemistry
Scope of the Thematic Issue: Combinatorial chemistry involves the synthesis and analysis of a large number of diverse compounds simultaneously. Traditional methods rely on brute force experimentation, which can be time-consuming and resource-intensive. Spectral Graph Theory, a branch of mathematics dealing with the properties of graphs in relation to the ...read more
Integrating Network Pharmacology and Traditional Medicine: A New Perspective in Drug Mechanism Research
Network pharmacology is a network construction and network topology analysis strategy that combines pharmacology and pharmacodynamics. In recent years, network pharmacology has emerged as a powerful tool that can be integrated with pharmacology. Natural products commonly function in multicomponent, multitarget, and multipathway systems. Some examples encompass Ayurveda, traditional Chinese medicines ...read more
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Ligand-Targeted Liposomal Therapies of Neuroblastoma
Current Medicinal Chemistry Proteasome Inhibitors Therapeutic Strategies for Cancer
Recent Patents on Anti-Cancer Drug Discovery Anti-Aggregating Antibodies, a New Approach Towards Treatment of Conformational Diseases
Current Medicinal Chemistry The cAMP-Dependent Protein Kinase Pathway as Therapeutic Target – Possibilities and Pitfalls
Current Topics in Medicinal Chemistry Vascular Endothelial Growth Factor and Vascular Endothelial Growth Factor Receptor Inhibitors as Anti-Angiogenic Agents in Cancer Therapy
Recent Patents on Anti-Cancer Drug Discovery Developments in the Application of 1,2,3-Triazoles in Cancer Treatment
Recent Patents on Anti-Cancer Drug Discovery Hesperetin Liposomes for Cancer Therapy
Current Drug Delivery Tumor Control by Manipulation of the Human Anti-Apoptotic Survivin Gene
Current Cancer Therapy Reviews Anti-Tumorigenic Effects of Resveratrol in Lung Cancer Cells Through Modulation of c-FLIP
Current Cancer Drug Targets Cytotoxicity and Target Modulation in Pediatric Solid Tumors by the Proteasome Inhibitor Carfilzomib
Current Cancer Drug Targets Development of Prodrugs for Enzyme-Mediated, Tumor-Selective Therapy
Current Medicinal Chemistry - Anti-Cancer Agents Influence of CD80 and CD86 Co-Stimulation in the Modulation of the Activation of Antigen Presenting Cells
Current Immunology Reviews (Discontinued) Cholesterol and Apoe: A Target for Alzheimers Disease Therapeutics
Current Drug Targets - CNS & Neurological Disorders Brain Innate Immunity in the Regulation of Neuroinflammation: Therapeutic Strategies by Modulating CD200-CD200R Interaction Involve the Cannabinoid System
Current Pharmaceutical Design Modular Nanotransporters for Targeted Intracellular Delivery of Drugs: Folate Receptors as Potential Targets
Current Pharmaceutical Design The Heat Shock Protein 90 Chaperone Complex: An Evolving Therapeutic Target
Current Cancer Drug Targets A Review of Natural and Modified Betulinic, Ursolic and Echinocystic Acid Derivatives as Potential Antitumor and Anti-HIV Agents
Mini-Reviews in Medicinal Chemistry Molecular Imaging of Therapeutic Potential of Reporter Probes
Current Drug Targets Management of the Menopausal Disturbances and Oxidative Stress
Current Pharmaceutical Design Multi-Target Directed Compounds with Antioxidant and/or Anti- Inflammatory Properties as Potent Agents for Alzheimer’s Disease
Medicinal Chemistry