Abstract
Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
Keywords: Structural genomics, machine learning, experimental success rate, target selection
Current Protein & Peptide Science
Title: Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques
Volume: 8 Issue: 2
Author(s): Pawel Smialowski, Antonio J. Martin-Galiano, Jurgen Cox and Dmitrij Frishman
Affiliation:
Keywords: Structural genomics, machine learning, experimental success rate, target selection
Abstract: Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
Export Options
About this article
Cite this article as:
Smialowski Pawel, Martin-Galiano J. Antonio, Cox Jurgen and Frishman Dmitrij, Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques, Current Protein & Peptide Science 2007; 8 (2) . https://dx.doi.org/10.2174/138920307780363398
DOI https://dx.doi.org/10.2174/138920307780363398 |
Print ISSN 1389-2037 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5550 |
Call for Papers in Thematic Issues
Advancements in Proteomic and Peptidomic Approaches in Cancer Immunotherapy: Unveiling the Immune Microenvironment
The scope of this thematic issue centers on the integration of proteomic and peptidomic technologies into the field of cancer immunotherapy, with a particular emphasis on exploring the tumor immune microenvironment. This issue aims to gather contributions that illustrate the application of these advanced methodologies in unveiling the complex interplay ...read more
Artificial Intelligence for Protein Research
Protein research, essential for understanding biological processes and creating therapeutics, faces challenges due to the intricate nature of protein structures and functions. Traditional methods are limited in exploring the vast protein sequence space efficiently. Artificial intelligence (AI) and machine learning (ML) offer promising solutions by improving predictions and speeding up ...read more
Nutrition and Metabolism in Musculoskeletal Diseases
The musculoskeletal system consists mainly of cartilage, bone, muscles, tendons, connective tissue and ligaments. Balanced metabolism is of vital importance for the homeostasis of the musculoskeletal system. A series of musculoskeletal diseases (for example, sarcopenia, osteoporosis) are resulted from the dysregulated metabolism of the musculoskeletal system. Furthermore, metabolic diseases (such ...read more
Protein Folding, Aggregation and Liquid-Liquid Phase Separation
Protein folding, misfolding and aggregation remain one of the main problems of interdisciplinary science not only because many questions are still open, but also because they are important from the point of view of practical application. Protein aggregation and formation of fibrillar structures, for example, is a hallmark of a ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Patenting of Nanopharmaceuticals in Drug Delivery: No Small Issue
Recent Patents on Drug Delivery & Formulation Structural and Binding Properties of the Active Cell Wall Hydrolase RipA from <i>M. tuberculosis</i>, a Promising Biosensing Molecule for Early Warning Bacterial Detection
Current Medicinal Chemistry Applications and Case Studies of the Next-Generation Sequencing Technologies in Food, Nutrition and Agriculture
Recent Patents on Food, Nutrition & Agriculture Drug Delivery Systems with Modified Release for Systemic and Biophase Bioavailability
Current Clinical Pharmacology Inhibitors of the Sulfur Assimilation Pathway in Bacterial Pathogens as Enhancers of Antibiotic Therapy
Current Medicinal Chemistry Current Treatment and Drug Discovery Against Leishmania spp. and Plasmodium spp.: A Review
Current Drug Targets Chemoinfectomics in Drug Design and Development
Anti-Infective Agents Mycobacterium tuberculosis Low Molecular Weight Phosphatases (MPtpA and MPtpB): From Biological Insight to Inhibitors
Current Medicinal Chemistry Pulmonary Infections in the Era of Biological Agents
Current Respiratory Medicine Reviews The Last Decade of Carbon Paste Electrodes in DNA Electrochemistry
Current Analytical Chemistry Novel Electrochemical Sensor for Rifampicin based on Ionic Liquid Functionalised TiO2 Nanoparticles
Current Analytical Chemistry 1,8-Naphthyridine Derivatives: A Privileged Scaffold for Versatile Biological Activities
Mini-Reviews in Medicinal Chemistry Itraconazole vs Fluconazole as a Primary Prophylaxis for Fungal Infections in HIV-Infected Patients in Thailand
Current HIV Research Mitochondrial and Plastid Functions as Antimalarial Drug Targets
Current Drug Targets - Infectious Disorders Electronic Nose and Exhaled Breath NMR-based Metabolomics Applications in Airways Disease
Current Topics in Medicinal Chemistry Computational Tools in the Discovery of New G-Quadruplex Ligands with Potential Anticancer Activity
Current Topics in Medicinal Chemistry Design of Anti-Bacterial Drug and Anti-Mycobacterial Drug for Drug Delivery System
Current Pharmaceutical Design Targeting TNF-Alpha in HIV-1 Infection
Current Drug Targets Self-Adjuvanting Lipopeptide Vaccines
Current Medicinal Chemistry Computational Drug Repositioning by Target Hopping: A Use Case in Chagas Disease
Current Pharmaceutical Design