Abstract
Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
Keywords: Structural genomics, machine learning, experimental success rate, target selection
Current Protein & Peptide Science
Title: Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques
Volume: 8 Issue: 2
Author(s): Pawel Smialowski, Antonio J. Martin-Galiano, Jurgen Cox and Dmitrij Frishman
Affiliation:
Keywords: Structural genomics, machine learning, experimental success rate, target selection
Abstract: Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
Export Options
About this article
Cite this article as:
Smialowski Pawel, Martin-Galiano J. Antonio, Cox Jurgen and Frishman Dmitrij, Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques, Current Protein & Peptide Science 2007; 8 (2) . https://dx.doi.org/10.2174/138920307780363398
DOI https://dx.doi.org/10.2174/138920307780363398 |
Print ISSN 1389-2037 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5550 |
Call for Papers in Thematic Issues
Advancements in Proteomic and Peptidomic Approaches in Cancer Immunotherapy: Unveiling the Immune Microenvironment
The scope of this thematic issue centers on the integration of proteomic and peptidomic technologies into the field of cancer immunotherapy, with a particular emphasis on exploring the tumor immune microenvironment. This issue aims to gather contributions that illustrate the application of these advanced methodologies in unveiling the complex interplay ...read more
Artificial Intelligence for Protein Research
Protein research, essential for understanding biological processes and creating therapeutics, faces challenges due to the intricate nature of protein structures and functions. Traditional methods are limited in exploring the vast protein sequence space efficiently. Artificial intelligence (AI) and machine learning (ML) offer promising solutions by improving predictions and speeding up ...read more
Nutrition and Metabolism in Musculoskeletal Diseases
The musculoskeletal system consists mainly of cartilage, bone, muscles, tendons, connective tissue and ligaments. Balanced metabolism is of vital importance for the homeostasis of the musculoskeletal system. A series of musculoskeletal diseases (for example, sarcopenia, osteoporosis) are resulted from the dysregulated metabolism of the musculoskeletal system. Furthermore, metabolic diseases (such ...read more
Protein Folding, Aggregation and Liquid-Liquid Phase Separation
Protein folding, misfolding and aggregation remain one of the main problems of interdisciplinary science not only because many questions are still open, but also because they are important from the point of view of practical application. Protein aggregation and formation of fibrillar structures, for example, is a hallmark of a ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Performance of Genotype MTBDRplus in the Detection of Resistance to Rifampicin and Isoniazid Among Clinical Mycobacteria Isolates in Ilorin, Nigeria
Current HIV Research Development of Linker-Conjugated Nanosize Lipid Vesicles: A Strategy for Cell Selective Treatment in Breast Cancer
Current Cancer Drug Targets Antimycobacterial Activities of Oxazolidinones: A Review
Infectious Disorders - Drug Targets Potential Drugs for the Treatment of COVID-19: Synthesis, Brief History and Application
Current Drug Research Reviews Tuberculosis Drug Targets
Current Drug Targets Vitamin D and Sepsis: From Associations to Causal Connections
Inflammation & Allergy - Drug Targets (Discontinued) Sepsis: Links between Pathogen Sensing and Organ Damage
Current Pharmaceutical Design Recognition of Nucleic Acids by Toll-Like Receptors and Development of Immunomodulatory Drugs
Current Medicinal Chemistry Recent Advances for Cell / Gene Therapy in Rheumatoid Arthritis
Current Medicinal Chemistry - Anti-Inflammatory & Anti-Allergy Agents Copper (II) – HisAibGly Complex and Its Superoxide Dismutase Activity (Suplementary Material)
Protein & Peptide Letters Increasing Access to HIV Testing for Women by Simplifying Pre- and Post-Test Counseling
Current Women`s Health Reviews Antiulcer and Antioxidant Activity of a Lectin from Mucuna pruriens Seeds on Ethanol- induced Gastropathy: Involvement of Alpha-2 Adrenoceptors and Prostaglandins
Current Pharmaceutical Design Interfacial Phenomenon Based Biocompatible Alginate-Chitosan Nanoparticles Containing Isoniazid and Pyrazinamide
Pharmaceutical Nanotechnology Current Understanding on Biosynthesis of Microbial Polysaccharides
Current Topics in Medicinal Chemistry Review of Structures Containing Fullerene-C60 for Delivery of Antibacterial Agents. Multitasking model for Computational Assessment of Safety Profiles
Current Bioinformatics Should We Develop an Inhaled Anti-pneumococcal Vaccine for Adults?
Current Medicinal Chemistry - Anti-Infective Agents The Impact of Crystallographic Data for the Development of Machine Learning Models to Predict Protein-Ligand Binding Affinity
Current Medicinal Chemistry Structural Characterization of Alpha-methylacyl-CoA Racemase: Comparative Structural Modeling, Molecular Docking and Dynamic Simulations Studies
Current Cancer Drug Targets Molecular Imaging of Therapeutic Potential of Reporter Probes
Current Drug Targets Fracture Risk Associated with Use of Antibiotics
Current Drug Safety