Abstract
Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.
Keywords: genetic algorithms, pattern recognition
Combinatorial Chemistry & High Throughput Screening
Title: Machine Learning Based Pattern Recognition Applied to Microarray Data
Volume: 7 Issue: 2
Author(s): Barry K. Lavine, Charles E. Davidson and William S. Rayens
Affiliation:
Keywords: genetic algorithms, pattern recognition
Abstract: Motivation: Microarrays have allowed the expression level of thousands of genes or proteins to be measured simultaneously. Data sets generated by these arrays consist of a small number of observations (e.g., 20-100 samples) on a very large number of variables (e.g., 10,000 genes or proteins). The observations in these data sets often have other attributes associated with them such as a class label denoting the pathology of the subject. Finding the genes or proteins that are correlated to these attributes is often a difficult task since most of the variables do not contain information about the pathology and as such can mask the identity of the relevant features. We describe a genetic algorithm (GA) that employs both supervised and unsupervised learning to mine gene expression and proteomic data. The pattern recognition GA selects features that increase clustering, while simultaneously searching for features that optimize the separation of the classes in a plot of the two or three largest principal components of the data. Because the largest principal components capture the bulk of the variance in the data, the features chosen by the GA contain information primarily about differences between classes in the data set. The principal component analysis routine embedded in the fitness function of the GA acts as an information filter, significantly reducing the size of the search space since it restricts the search to feature sets whose principal component plots show clustering on the basis of class. The algorithm integrates aspects of artificial intelligence and evolutionary computations to yield a smart one pass procedure for feature selection, clustering, classification, and prediction.
Export Options
About this article
Cite this article as:
Lavine K. Barry, Davidson E. Charles and Rayens S. William, Machine Learning Based Pattern Recognition Applied to Microarray Data, Combinatorial Chemistry & High Throughput Screening 2004; 7 (2) . https://dx.doi.org/10.2174/138620704773120801
DOI https://dx.doi.org/10.2174/138620704773120801 |
Print ISSN 1386-2073 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5402 |
Call for Papers in Thematic Issues
Artificial Intelligence Methods for Biomedical, Biochemical and Bioinformatics Problems
Recently, a large number of technologies based on artificial intelligence have been developed and applied to solve a diverse range of problems in the areas of biomedical, biochemical and bioinformatics problems. By utilizing powerful computing resources and massive amounts of data, methods based on artificial intelligence can significantly improve the ...read more
Eco-friendly Agents for Biological Control of Pathogenic Diseases
The discovery of an alternative biological approach to disease management includes work on medicinal products derived from natural sources as a starting point for the development of eco-friendly agents for these diseases and the injuries they cause, as well as reducing human contact with hazardous chemicals and their residues. We ...read more
Emerging trends in diseases mechanisms, noble drug targets and therapeutic strategies: focus on immunological and inflammatory disorders
Recently infectious and inflammatory diseases have been a key concern worldwide due to tremendous morbidity and mortality world Wide. Recent, nCOVID-9 pandemic is a good example for the emerging infectious disease outbreak. The world is facing many emerging and re-emerging diseases out breaks at present however, there is huge lack ...read more
Exploring Spectral Graph Theory in Combinatorial Chemistry
Scope of the Thematic Issue: Combinatorial chemistry involves the synthesis and analysis of a large number of diverse compounds simultaneously. Traditional methods rely on brute force experimentation, which can be time-consuming and resource-intensive. Spectral Graph Theory, a branch of mathematics dealing with the properties of graphs in relation to the ...read more
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Review of the Third Domain Receptor Binding Fragment of Alphafetoprotein (AFP): Plausible Binding of AFP to Lysophospholipid Receptor Targets
Current Drug Targets Paclitaxel Resistance: Molecular Mechanisms and Pharmacologic Manipulation
Current Cancer Drug Targets Molecular Treatment of Different Breast Cancers
Anti-Cancer Agents in Medicinal Chemistry Cancer Molecular Imaging: Radionuclide-Based Biomarkers of the Epidermal Growth Factor Receptor (EGFR)
Current Topics in Medicinal Chemistry Histone Deacetylase Inhibitors as Potent Modulators of Cellular Contacts
Current Drug Targets Monoclonal Antibodies, Small Molecule Inhibitors and Antibody-drug Conjugates as HER2 Inhibitors
Current Medicinal Chemistry 4-Hydroxynonenal in the Pathogenesis and Progression of Human Diseases
Current Medicinal Chemistry Heparin Affin Regulatory Peptide: A New Target for Tumour Therapy?
Current Cancer Drug Targets Impact of Hybrid-polar Histone Deacetylase Inhibitor m-Carboxycinnamic Acid bis-Hydroxyamide on Human Pancreatic Adenocarcinoma Cells
Anti-Cancer Agents in Medicinal Chemistry Small Molecule Inhibitors of Peptidylprolyl cis/trans Isomerase
Current Enzyme Inhibition The Peripheral Anionic Site of Acetylcholinesterase: Structure, Functions and Potential Role in Rational Drug Design
Current Pharmaceutical Design Development of Lymphatic Vessels: Tumour Lymphangiogenesis and Lymphatic Invasion
Current Medicinal Chemistry RO3280: A Novel PLK1 Inhibitor, Suppressed the Proliferation of MCF-7 Breast Cancer Cells Through the Induction of Cell Cycle Arrest at G2/M Point
Anti-Cancer Agents in Medicinal Chemistry A Novel Monoclonal Antibody Against the C-terminus of β-Tubulin Recognizes Endocytic Organelles in Trypanosoma cruzi
Protein & Peptide Letters Therapeutic Applications of Crocus sativus L. (Saffron): A Review
The Natural Products Journal Blockade of Ser16-Hsp20 Phosphorylation Attenuates Neuroprotection Dependent Upon Bcl-2 and Bax
Current Neurovascular Research Evolution of the Strategies for Screening and Identifying Human Tumor Antigens
Current Protein & Peptide Science New Spirocyclic Hydroxamic Acids as Effective Antiproliferative Agents
Anti-Cancer Agents in Medicinal Chemistry Screening for Amyloid Aggregation: In-Silico, In-Vitro and In-Vivo Detection
Current Protein & Peptide Science Hydrophobic Plant Antioxidants. Preparation of Nanoparticles and their Application for Prevention of Neurodegenerative Diseases. Review and Experimental Data
Current Topics in Medicinal Chemistry