Abstract
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Current Genomics
Title: Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
Volume: 10 Issue: 6
Author(s): Lori Dalton, Virginia Ballarin and Marcel Brun
Affiliation:
Abstract: The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Export Options
About this article
Cite this article as:
Dalton Lori, Ballarin Virginia and Brun Marcel, Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics, Current Genomics 2009; 10 (6) . https://dx.doi.org/10.2174/138920209789177601
DOI https://dx.doi.org/10.2174/138920209789177601 |
Print ISSN 1389-2029 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5488 |
Call for Papers in Thematic Issues
Advanced Computational Algorithms and Artificial Intelligence in Clinical Pharmacogenomics
In the era of personalized medicine, understanding the relationship between genetics and drug response is crucial. This issue delves into innovative methodologies, leveraging deep computational analysis and artificial intelligence, to enhance the field of Clinical Pharmacogenomics. The interdisciplinary approach harnesses the power of advanced high-throughput genotyping technologies, sophisticated computational analysis, ...read more
Applications of Single-cell Sequencing Technology in Reproductive Medicine
Single cell sequencing (SCS) technology utilizes individual cells' genetic material to sequence their genome, transcriptome, and epigenetics at the molecular level. It offers insights into cell heterogeneity and enables the study of limited biological materials. Since its recognition as a valuable technique in 2011, single cell sequencing has yielded numerous ...read more
Big Data in Cancer Research
Cancer is a significant threat to human life and health, remaining a highly aggressive killer. It is a leading cause of death worldwide and represents a crucial medical issue for humanity. However, in the past decade, the effectiveness of new synthetic anticancer agents has not matched the current clinical speculation. ...read more
Current Genomics in Cardiovascular Research
Cardiovascular diseases are the main cause of death in the world, in recent years we have had important advances in the interaction between cardiovascular disease and genomics. In this Research Topic, we intend for researchers to present their results with a focus on basic, translational and clinical investigations associated with ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
- Announcements
Related Articles
-
Angiogenesis Inhibition: State of the Art, Forgotten Strategies and New Perspectives in Cancer Therapy
Current Cancer Therapy Reviews Status Quo in Antibody-Drug Conjugates - Can Glyco- Enzymes Solve the Current Challenges?
Protein & Peptide Letters Fullerenes for Cancer Diagnosis and Therapy: Preparation, Biological and Clinical Perspectives
Current Drug Metabolism The Eker Rat: Establishing a Genetic Paradigm Linking Renal Cell Carcinoma and Uterine Leiomyoma
Current Molecular Medicine Progresses in TCM Metal-Based Antitumour Agents
Anti-Cancer Agents in Medicinal Chemistry Anthraquinones: Analytical Techniques as a Novel Tool to Investigate on the Triggering of Biological Targets
Current Drug Targets Exosomal MicroRNAs in Tumoral U87 MG Versus Normal Astrocyte Cells
MicroRNA New and Highly Potent Antitumor Natural Products from Marine-Derived Fungi: Covering the Period from 2003 to 2012
Current Topics in Medicinal Chemistry A Highly Sensitive Fluorimetric Method for Determination of Cinacalcet Hydrochloride in Tablets and Plasma via Derivatization with 7-Chloro-4- nitrobenzoxadiazole
Current Analytical Chemistry Bio-Activities and Syntheses Developments of Triptolides
Mini-Reviews in Organic Chemistry An Updated Review of Natural Products Intended to Prevent or Treat Oral Mucositis in Patients Undergoing Radio-Chemotherapy
Current Pharmaceutical Biotechnology Nanoparticle Interaction with Biomolecules: How it Shapes the Nano-Effects on Immunity
Current Bionanotechnology (Discontinued) Treatment for Cancer Patients with Oral Mucositis: Assessment Based on the Mucositis Study Group of the Multinational Association of Supportive Care in Cancer in International Society of Oral Oncology (MASCC/ISOO) in 2013 and Proposal of Possible Novel Treatment with a Japanese Herbal Medicine
Current Pharmaceutical Design Targeting Prostate Cancer Stem Cells
Anti-Cancer Agents in Medicinal Chemistry Biosurfactants as a Novel Additive in Pharmaceutical Formulations: Current Trends and Future Implications
Current Drug Metabolism Application of Recombinant and Non-Recombinant Peptides in the Determination of Tumor Response to Cancer Therapy
Current Pharmaceutical Biotechnology Applications and Modifications of 1,2,3,4-Tetrahydroisoquinoline-3-Carboxylic Acid (Tic) in Peptides and Peptidomimetics Design and Discovery
Current Protein & Peptide Science Ovarian Hyperstimulation Syndrome: The Best Approaches for Preven-tion and Treatment: A Mini-Review
Current Women`s Health Reviews The Protein-Protein Interaction-Mediated Inactivation of PTEN
Current Molecular Medicine Inflammation-Mediating Proteases: Structure, Function in (Patho) Physiology and Inhibition
Protein & Peptide Letters