Abstract
Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information.
Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically.
Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M.
Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.
Keywords: Adjacency matrix, Generalized PseAAC, graph, identification of DNA-binding proteins, phylogenetic analysis, protein sequences.
Combinatorial Chemistry & High Throughput Screening
Title:Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation
Volume: 21 Issue: 2
Author(s): Chun Li*, Jialing Zhao, Changzhong Wang and Yuhua Yao
Affiliation:
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158,China
Keywords: Adjacency matrix, Generalized PseAAC, graph, identification of DNA-binding proteins, phylogenetic analysis, protein sequences.
Abstract: Aim and Objective: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information.
Methods: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically.
Results: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82- 33.85% in terms of F1M.
Conclusion: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.
Export Options
About this article
Cite this article as:
Li Chun *, Zhao Jialing , Wang Changzhong and Yao Yuhua, Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation, Combinatorial Chemistry & High Throughput Screening 2018; 21 (2) . https://dx.doi.org/10.2174/1386207321666180130100838
DOI https://dx.doi.org/10.2174/1386207321666180130100838 |
Print ISSN 1386-2073 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5402 |
Call for Papers in Thematic Issues
Artificial Intelligence Methods for Biomedical, Biochemical and Bioinformatics Problems
Recently, a large number of technologies based on artificial intelligence have been developed and applied to solve a diverse range of problems in the areas of biomedical, biochemical and bioinformatics problems. By utilizing powerful computing resources and massive amounts of data, methods based on artificial intelligence can significantly improve the ...read more
Eco-friendly Agents for Biological Control of Pathogenic Diseases
The discovery of an alternative biological approach to disease management includes work on medicinal products derived from natural sources as a starting point for the development of eco-friendly agents for these diseases and the injuries they cause, as well as reducing human contact with hazardous chemicals and their residues. We ...read more
Emerging trends in diseases mechanisms, noble drug targets and therapeutic strategies: focus on immunological and inflammatory disorders
Recently infectious and inflammatory diseases have been a key concern worldwide due to tremendous morbidity and mortality world Wide. Recent, nCOVID-9 pandemic is a good example for the emerging infectious disease outbreak. The world is facing many emerging and re-emerging diseases out breaks at present however, there is huge lack ...read more
Exploring Spectral Graph Theory in Combinatorial Chemistry
Scope of the Thematic Issue: Combinatorial chemistry involves the synthesis and analysis of a large number of diverse compounds simultaneously. Traditional methods rely on brute force experimentation, which can be time-consuming and resource-intensive. Spectral Graph Theory, a branch of mathematics dealing with the properties of graphs in relation to the ...read more
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
Related Articles
-
Polyphenolic Natural Products Active <i>In Silico </i> Against SARS-CoV-2 Spike
Receptor Binding Domains and Non-structural Proteins - A Review
Combinatorial Chemistry & High Throughput Screening Recent Advances in the Development of Antiviral Agents Using Computer-aided Structure Based Approaches
Current Pharmaceutical Design Demographic, Clinical and Laboratory Profiles of HIV Infected Patients Admitted into Imam Khomeini Hospital of Tehran, Iran
Infectious Disorders - Drug Targets Prediction of Michaelis-Menten Constant of Beta-Glucosidases Using Nitrophenyl-beta-D-Glucopyranoside as Substrate
Protein & Peptide Letters Antiobesity Carbonic Anhydrase Inhibitors
Current Topics in Medicinal Chemistry Reinventing Electronic Health Records During COVID-19: Better Patient Data and Faster Research by Restructuring Electronic Health Record Systems
Current Women`s Health Reviews Development of EGFR Family Small Molecule Inhibitors for Anticancer Intervention: An Overview of Approved Drugs and Clinical Candidates
Current Medicinal Chemistry Salinomycin: A Novel Anti-Cancer Agent with Known Anti-Coccidial Activities
Current Medicinal Chemistry Biological Activities of Pyrazoline Derivatives -A Recent Development
Recent Patents on Anti-Infective Drug Discovery Proteomics of Human Pulmonary Surfactant Proteins
Current Proteomics Clinical Phenotypes of Severe Cutaneous Drug Hypersensitivity Reactions
Current Pharmaceutical Design Targeting Plasminogen Activator Inhibitor-1: Role in Cell Signaling and the Biology of Domain-Specific Knock-in Mice
Current Drug Targets Common SAR Derived from Multiple QSAR Models on Vorinostat Derivatives Targeting HDACs in Tumor Treatment
Current Pharmaceutical Design Weaning and Extubation in Pediatrics
Current Respiratory Medicine Reviews Further Insights in 5-phenyl-2-[2-(1-piperidinylcarbonyl) Phenyl]-2,3- dihydro-1H-pyrrolo[1,2-c]imidazol-1-ones, a Recently Disclosed Class of Neuropeptide S Antagonists
Letters in Drug Design & Discovery Design, Synthesis and Biological Evaluation of a Library of Thiocarbazates and Their Activity as Cysteine Protease Inhibitors
Combinatorial Chemistry & High Throughput Screening Computer Aided Drug Design: Success and Limitations
Current Pharmaceutical Design Anti-MRSA (Multidrug Resistant Staphylococcus aureus) Activity of 3-Substituted Coumarins
Letters in Drug Design & Discovery Synthesis and Biological Evaluation of some Amide Derivatives Bearing Benzothiazole and Piperidine Moieties as Antimicrobial Agents
Letters in Drug Design & Discovery Drug Delivery Systems For Anti-Cancer Active Complexes of Some Coinage Metals
Current Medicinal Chemistry