Abstract
Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
Keywords: Structural genomics, machine learning, experimental success rate, target selection
Current Protein & Peptide Science
Title: Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques
Volume: 8 Issue: 2
Author(s): Pawel Smialowski, Antonio J. Martin-Galiano, Jurgen Cox and Dmitrij Frishman
Affiliation:
Keywords: Structural genomics, machine learning, experimental success rate, target selection
Abstract: Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine- learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection.
Export Options
About this article
Cite this article as:
Smialowski Pawel, Martin-Galiano J. Antonio, Cox Jurgen and Frishman Dmitrij, Predicting Experimental Properties of Proteins from Sequence by Machine Learning Techniques, Current Protein & Peptide Science 2007; 8 (2) . https://dx.doi.org/10.2174/138920307780363398
DOI https://dx.doi.org/10.2174/138920307780363398 |
Print ISSN 1389-2037 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5550 |
Call for Papers in Thematic Issues
Advancements in Proteomic and Peptidomic Approaches in Cancer Immunotherapy: Unveiling the Immune Microenvironment
The scope of this thematic issue centers on the integration of proteomic and peptidomic technologies into the field of cancer immunotherapy, with a particular emphasis on exploring the tumor immune microenvironment. This issue aims to gather contributions that illustrate the application of these advanced methodologies in unveiling the complex interplay ...read more
Artificial Intelligence for Protein Research
Protein research, essential for understanding biological processes and creating therapeutics, faces challenges due to the intricate nature of protein structures and functions. Traditional methods are limited in exploring the vast protein sequence space efficiently. Artificial intelligence (AI) and machine learning (ML) offer promising solutions by improving predictions and speeding up ...read more
Nutrition and Metabolism in Musculoskeletal Diseases
The musculoskeletal system consists mainly of cartilage, bone, muscles, tendons, connective tissue and ligaments. Balanced metabolism is of vital importance for the homeostasis of the musculoskeletal system. A series of musculoskeletal diseases (for example, sarcopenia, osteoporosis) are resulted from the dysregulated metabolism of the musculoskeletal system. Furthermore, metabolic diseases (such ...read more
Protein Folding, Aggregation and Liquid-Liquid Phase Separation
Protein folding, misfolding and aggregation remain one of the main problems of interdisciplinary science not only because many questions are still open, but also because they are important from the point of view of practical application. Protein aggregation and formation of fibrillar structures, for example, is a hallmark of a ...read more
Related Journals
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Nanoformulations of Moxifloxacin, Econozole and Ethionamide as Novel Treatment Regimens Against MDR TB - An Experimental Study
Current Nanoscience Recent Patents on Live Bacteria and their Products as Potential Anticancer Agents
Recent Patents on Anti-Cancer Drug Discovery Novel Approach for the Synthesis of Pyrrolo[2,1-c][1,4]Benzoxazines and Pyrrolo[1,2-a]Quinoxalines
Combinatorial Chemistry & High Throughput Screening Functional Gene Discovery Using RNA Interference-Based Genomic Screens to Combat Pathogen Infection
Current Drug Discovery Technologies Correlates of Immune Protection from Tuberculosis
Current Molecular Medicine Design, Synthesis and Biological Evaluation of Novel Tetrahydroquinoline Based Propanehydrazides as Antitubercular Agents
Letters in Drug Design & Discovery Dihydrofolate Reductase as a Model for Enzyme Catalysis
Current Biotechnology Proteome Analysis Revealed Jak/Stat Signaling and Cytoskeleton Rearrangement Proteins in Human Lung Epithelial Cells During Interaction with Aspergillus terreus
Current Signal Transduction Therapy Synthesis and Biological Activity of 2-Amino- and 2-aryl (Heteryl) Substituted 1,3-Benzothiazin-4-ones
Mini-Reviews in Medicinal Chemistry Using NMR to Develop New Allosteric and Allo-Network Drugs
Current Drug Discovery Technologies Determination of Molecular Properties Effectuating the Growth Inhibition of Mycobacterium Tuberculosis by Various Small Molecule Hydrazides
Letters in Drug Design & Discovery Alterations in the Vascular Reactivity of Aorta in the Early and Late Phase of Adjuvant-Induced Arthritis in Rat
Vascular Disease Prevention (Discontinued) Lipid Matrix Nanoparticles: Pharmacokinetics and Biopharmaceutics
Current Nanoscience Use of Guanidine Compounds in the Treatment of Neglected Tropical Diseases
Current Organic Chemistry Clinical Interpretation of Drug Susceptibility Tests in Tuberculosis
Current Respiratory Medicine Reviews Association between the Use of Inhaled Corticosteroids and Pulmonary Nontuberculous Mycobacterial Infection: A Systematic Review
Current Respiratory Medicine Reviews Cytokines in Systemic Lupus Erythematosus
Current Molecular Medicine Neuro-psychopharmacogenetics and Neurological Antecedents of Posttraumatic Stress Disorder: Unlocking the Mysteries of Resilience and Vulnerability
Current Neuropharmacology ent-Abietane Lactones from Euphorbia
Mini-Reviews in Medicinal Chemistry Mechanisms of Action and Chemical-Biological Interactions Between Ozone and Body Compartments: A Critical Appraisal of the Different Administration Routes
Current Drug Therapy