Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

GenNBPSeq: Online Web Server to Generate Never Born Protein Sequences Using Toeplitz Matrix Approach with Structure Analysis

Author(s): Muthugobal Bagayalakshmi Karuna Nidhi, Ramesh Ganapathy, Parthasarathy Subbiah*, Suvaithenamudhan Suvaiyarasan and Muthuvel Prasath Karuppasamy

Volume 17, Issue 7, 2022

Published on: 25 August, 2022

Page: [565 - 577] Pages: 13

DOI: 10.2174/1574893617666220519110154

Price: $65

Abstract

Background: In biology, the translation of genetic information to its corresponding protein sequences is carried out using the Universal Genetic Code. Out of all the possible combinations of 20 amino acids, proteins are formed by the possible combinations that occur naturally. This leaves a large number of unknown combinations of protein sequences that include the Never Born Proteins. A Never Born Protein is a theoretically possible protein that does not occur in nature or may be selected by evolution in future.

Objective: In this study, the "GenNBPSeq" online web server is developed to generate Never Born Protein Sequences and to analyze their sequence and structural stability.

Methods: The “GenNBPSeq” server is developed based on the Gray Code and Partitioned Gray Code representations of the Universal Genetic Code combined with the novel Toeplitz matrix approach. The sequence and structure analysis is done by various bioinformatics tools for the sample Never Born Protein sequences.

Results: The “GenNBPSeq” server is available at http://bioinfo.bdu.ac.in/nbps and the users can generate Never Born Protein sequences and download them in FASTA formats. The Never Born Protein sequences obtained by the above Toeplitz matrix approach contain the same amino acid composition. They also form protein secondary and 3-Dimensional structures with intrinsic stability.

Conclusion: This study conjectures that the Never Born Protein Sequences generated by “GenNBPSeq” server using the Toeplitz matrix approach may exhibit intrinsic structural stability. Synthesizing these Never Born Proteins and analyzing their biological applications are major research areas in Systems and Synthetic Biology.

Keywords: Gray code, universal genetic code, toeplitz matrices, never born protein sequences, molecular modelling, molecular dynamics simulations.

Graphical Abstract
[1]
Weissig H, Bourne PE. Structural Bioinformatics. Wiley-Liss 2003.
[2]
Uversky VN. Introduction to intrinsically disordered proteins (IDPs). Chem Rev 2014; 114(13): 6557-60.
[http://dx.doi.org/10.1021/cr500288y] [PMID: 25004990]
[3]
Chiarabelli C, Vrijbloed JW, De Lucrezia D, et al. Investigation of de novo totally random biosequences, Part II: On the folding frequency in a totally random library of de novo proteins obtained by phage display. Chem Biodivers 2006; 3(8): 840-59.
[http://dx.doi.org/10.1002/cbdv.200690088] [PMID: 17193317]
[4]
Luisi PL, Chiarabelli C, Stano P. From never born proteins to minimal living cells: Two projects in synthetic biology. Orig Life Evol Biosph 2006; 36(5-6): 605-16.
[http://dx.doi.org/10.1007/s11084-006-9033-6] [PMID: 17131092]
[5]
Luisi PL. Chemical aspects of synthetic biology. Chem Biodivers 2007; 4(4): 603-21.
[http://dx.doi.org/10.1002/cbdv.200790053] [PMID: 17443874]
[6]
Singh V, Dhar PK. Systems and synthetic biology. Springer 2015.
[http://dx.doi.org/10.1007/978-94-017-9514-2]
[7]
Szoniec G, Ogorzalek MJ. Entropy of never born protein sequences. Springerplus 2013; 2(1): 200.
[http://dx.doi.org/10.1186/2193-1801-2-200] [PMID: 23750329]
[8]
Evangelista G, Minervini G, Luisi PL, Polticelli F. RandomBlast a tool to generate random” never born protein” sequences. Bio-algorithms and Med-systems 2007; 3(5): 27-31.
[9]
Minervini G, Evangelista G, Polticelli F, et al. Never born proteins as a test case for ab initio protein structures prediction. Bioinformation 2008; 3(4): 177-9.
[http://dx.doi.org/10.6026/97320630003177] [PMID: 19238243]
[10]
He MX, Petoukhov SV, Ricci PE. Genetic code, hamming distance and stochastic matrices. Bull Math Biol 2004; 66(5): 1405-21.
[http://dx.doi.org/10.1016/j.bulm.2004.01.002] [PMID: 15294430]
[11]
Petoukhov S, He M. Symmetrical analysis techniques for genetic systems and bioinformatics: Advanced patterns and applications. IGI Global 2009.
[12]
He M, Petukhov SV. Mathematics of bioinformatics: Theory, practice, and applications. Hoboken, NJ: Wiley 2011.
[13]
Nirenberg M, Leder P, Bernfield M, et al. RNA codewords and protein synthesis, VII. On the general nature of the RNA code. Proc Natl Acad Sci USA 1965; 53(5): 1161-8.
[http://dx.doi.org/10.1073/pnas.53.5.1161] [PMID: 5330357]
[14]
Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acids Res 1990; 18(8): 2163-70.
[http://dx.doi.org/10.1093/nar/18.8.2163] [PMID: 2336393]
[15]
Nandy A. A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr Sci 1994; 66(4): 309-14.
[16]
Jimenez-Montano MA, de la Mora-Basanez CR, Poeschel T. On the hypercube structure of the genetic code ArXiv preprint condmat/0204044 2002.
[17]
Yang CM. The naturally designed spherical symmetry in the genetic code. ArXiv preprint q-bio/0309014 2003.
[18]
Swanson R. A unifying concept for the amino acid code. Bull Math Biol 1984; 46(2): 187-203.
[http://dx.doi.org/10.1016/S0092-8240(84)80018-X] [PMID: 6733309]
[19]
Toeplitz O. Zur Theorie der quadratischen und bilinearen Formen von unendlichvielen Veränderlichen. Math Ann 1911; 70(3): 351-76.
[http://dx.doi.org/10.1007/BF01564502]
[20]
Deift P, Its A, Krasovsky I. Toeplitz matrices and Toeplitz determinants under the impetus of the Ising model. Some history and some recent results ArXiv preprint arXiv:12074990 2012.
[21]
Muthugobal BKN, Ramesh G, Parthasarathy S, Suvaithenamudhan S, Muthuvel Prasath K. Gray code representation of the universal genetic code: Generation of never born protein sequences using Toeplitz matrix approach. Biosystems 2020; 198: 104280.
[http://dx.doi.org/10.1016/j.biosystems.2020.104280] [PMID: 33161051]
[22]
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215(3): 403-10.
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712]
[23]
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 1997; 25(17): 3389-402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[24]
El-Gebali S, Mistry J, Bateman A, et al. The Pfam protein families database in 2019. Nucleic Acids Res 2019; 47(D1): D427-32.
[http://dx.doi.org/10.1093/nar/gky995] [PMID: 30357350]
[25]
Ganesan K, Parthasarathy S. PredictFold-PSS-3D1D: A protein fold recognition server for predicting folds from the twilight zone sequences. Curr Bioinform 2013; 8(5): 552-6.
[http://dx.doi.org/10.2174/1574893611308050005]
[26]
Muthuvel Prasath K, Ganesan K, Parthasarathy S. PredictSuperFam-PSS-3D1D: A server for predicting superfamily for the annotation of twilight zone protein sequences. J Struct Biol 2020; 210(2): 107479.
[http://dx.doi.org/10.1016/j.jsb.2020.107479] [PMID: 32081792]
[27]
Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, Bairoch A. Protein identification and analysis tools on the ExPASy server. The proteomics Protocols Handbook. 2005; pp. 571-607.
[http://dx.doi.org/10.1385/1-59259-890-0:571]
[28]
Combet C, Blanchet C, Geourjon C, Deléage G. NPS@: Network protein sequence analysis. Trends Biochem Sci 2000; 25(3): 147-50.
[http://dx.doi.org/10.1016/S0968-0004(99)01540-6] [PMID: 10694887]
[29]
Roy A, Kucukural A, Zhang Y. I-TASSER: A unified platform for automated protein structure and function prediction. Nat Protoc 2010; 5(4): 725-38.
[http://dx.doi.org/10.1038/nprot.2010.5] [PMID: 20360767]
[30]
Yang J, Zhang Y. I-TASSER server: New development for protein structure and function predictions. Nucleic Acids Res 2015; 43(W1): W174-81.
[http://dx.doi.org/10.1093/nar/gkv342] [PMID: 25883148]
[31]
Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008; 9(1): 40.
[http://dx.doi.org/10.1186/1471-2105-9-40] [PMID: 18215316]
[32]
Abraham MJ, Murtola T, Schulz R, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015; 1: 19-25.
[http://dx.doi.org/10.1016/j.softx.2015.06.001]
[33]
Luo H, Nijveen H. Understanding and identifying amino acid repeats. Brief Bioinform 2014; 15(4): 582-91.
[http://dx.doi.org/10.1093/bib/bbt003] [PMID: 23418055]
[34]
Rohl CA, Strauss CE, Misura KM, Baker D. Protein structure prediction using Rosetta. Methods Enzymol 2004; 383: 66-93.
[http://dx.doi.org/10.1016/S0076-6879(04)83004-0] [PMID: 15063647]
[35]
Brylinski M, Konieczny L, Roterman I. Fuzzy-oil-drop hydrophobic force field a model to represent late-stage folding (in silico) of lysozyme. J Biomol Struct Dyn 2006; 23(5): 519-28.
[http://dx.doi.org/10.1080/07391102.2006.10507076] [PMID: 16494501]
[36]
Brylinski M, Konieczny L, Roterman I. Hydrophobic collapse in (in silico) protein folding. Comput Biol Chem 2006; 30(4): 255-67.
[http://dx.doi.org/10.1016/j.compbiolchem.2006.04.007] [PMID: 16798094]
[37]
Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: A program to check the stereochemical quality of protein structures. J Appl Cryst 1993; 26(2): 283-91.
[http://dx.doi.org/10.1107/S0021889892009944]
[38]
Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol 1963; 7(1): 95-9.
[http://dx.doi.org/10.1016/S0022-2836(63)80023-6] [PMID: 13990617]
[39]
Jorgensen WL, Tirado-Rives J. The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc 1988; 110(6): 1657-66.
[http://dx.doi.org/10.1021/ja00214a001] [PMID: 27557051]
[40]
Berendsen HJ, Postma JP, van Gunsteren WF, Hermans J. Interaction models for water in relation to protein hydration. Intermolecular Forces. Springer 1981; pp. 331-42.
[http://dx.doi.org/10.1007/978-94-015-7658-1_21]
[41]
Parrinello M, Rahman A. Crystal structure and pair potentials: A molecular-dynamics study. Phys Rev Lett 1980; 45(14): 1196-9.
[http://dx.doi.org/10.1103/PhysRevLett.45.1196]
[42]
Parrinello M, Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. J Appl Phys 1981; 52(12): 7182-90.
[http://dx.doi.org/10.1063/1.328693]
[43]
Parrinello M, Rahman A. Strain fluctuations and elastic constants. J Chem Phys 1982; 76(5): 2662-6.
[http://dx.doi.org/10.1063/1.443248]
[44]
Colón W, Church J, Sen J, Thibeault J, Trasatti H, Xia K. Biological roles of protein kinetic stability. Biochemistry 2017; 56(47): 6179-86.
[http://dx.doi.org/10.1021/acs.biochem.7b00942] [PMID: 29087706]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy