Abstract
The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Current Genomics
Title: Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
Volume: 10 Issue: 6
Author(s): Lori Dalton, Virginia Ballarin and Marcel Brun
Affiliation:
Abstract: The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
Export Options
About this article
Cite this article as:
Dalton Lori, Ballarin Virginia and Brun Marcel, Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics, Current Genomics 2009; 10 (6) . https://dx.doi.org/10.2174/138920209789177601
DOI https://dx.doi.org/10.2174/138920209789177601 |
Print ISSN 1389-2029 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5488 |
Call for Papers in Thematic Issues
Advanced AI Techniques in Big Genomic Data Analysis
The thematic issue on "Advanced AI Techniques in Big Genomic Data Analysis" aims to explore the cutting-edge methodologies and applications of artificial intelligence (AI) in the realm of genomic research, where vast amounts of data pose both challenges and opportunities. This issue will cover a broad spectrum of AI-driven strategies, ...read more
Current Genomics in Cardiovascular Research
Cardiovascular diseases are the main cause of death in the world, in recent years we have had important advances in the interaction between cardiovascular disease and genomics. In this Research Topic, we intend for researchers to present their results with a focus on basic, translational and clinical investigations associated with ...read more
Deep Learning in Single Cell Analysis
The field of biology is undergoing a revolution in our ability to study individual cells at the molecular level, and to integrate data from multiple sources and modalities. This has been made possible by advances in technologies for single-cell sequencing, multi-omics profiling, spatial transcriptomics, and high-throughput imaging, as well as ...read more
Genomic Insights into Oncology: Harnessing Machine Learning for Breakthroughs in Cancer Genomics.
This special issue aims to explore the cutting-edge intersection of genomics and oncology, with a strong emphasis on original data and experimental validation. While maintaining the focus on how machine learning and advanced data analysis techniques are revolutionizing our understanding and treatment of cancer, this issue will prioritize contributions that ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
- Announcements
Related Articles
-
Evaluation of Response Surface Methodology to Predict Optimum Growth Conditions for Lactobacillus plantarum A7 (KC 355240) in Probiotic Soy Milk Containing Essential Oil of Cuminum cyminum
Recent Patents on Food, Nutrition & Agriculture Research Progress and Future Development Potential of Oridonin in Pharmacological Activities
Current Molecular Pharmacology S100A8 and S100A9 Overexpression Is Associated with Poor Pathological Parameters in Invasive Ductal Carcinoma of the Breast
Current Cancer Drug Targets Anti-Angiogenic and Anti-Inflammatory Effects of Statins: Relevance to Anti-Cancer Therapy
Current Cancer Drug Targets Poxvirus-Based Vaccines for Cancer Immunotherapy: New Insights from Combined Cytokines/Co-Stimulatory Molecules Delivery and “Uncommon” Strains
Anti-Cancer Agents in Medicinal Chemistry Dipyridamole: A Drug with Unrecognized Antioxidant Activity
Current Topics in Medicinal Chemistry Genetic and Epigenetic Studies for Determining Molecular Targets of Natural Product Anticancer Agents
Current Cancer Drug Targets Pharmacogenomics in Aspirin Intolerance
Current Drug Metabolism Analytical Approaches for Assaying Metallodrugs in Biological Samples: Recent Methodological Developments and Future Trends
Current Drug Metabolism Emulsomes Meet S-layer Proteins: An Emerging Targeted Drug Delivery System
Current Pharmaceutical Biotechnology Genetic Predisposition to Parkinson’s Disease and Cancer
Current Cancer Drug Targets Emerging Role of Mucins in Epithelial to Mesenchymal Transition
Current Cancer Drug Targets Vitamin D and Lung Cancer
Current Respiratory Medicine Reviews Role of Preoperative Diagnostic Imaging Procedures in Secondary Hyperparathyroidism
Current Medical Imaging Serum miRNAs Signature Plays an Important Role in Keloid Disease
Current Molecular Medicine Antioxidant Effect of Mangiferin and its Potential to be a Cancer Chemoprevention Agent
Letters in Drug Design & Discovery Cathepsin D as a Promising Target for the Discovery of Novel Anticancer Agents
Current Cancer Drug Targets Targeting p53 in Cancer
Current Medicinal Chemistry - Anti-Cancer Agents A Pan-cancer Analysis Reveals the Tissue Specificity and Prognostic Impact of Angiogenesis-associated Genes in Human Cancers
Current Bioinformatics Immunomodulation and Anti-inflammatory Roles of Polyphenols as Anticancer Agents
Anti-Cancer Agents in Medicinal Chemistry