Generalized String Pseudo-Folding Lattices in Bioinformatics: State-of-Art Review, New Model for Enzyme Sub-Classes, and Study of ESTs on Trichinella spiralis
Lazaro G. Perez-Montoto,
Florencio M. Ubeira,
Several graph representations have been introduced for different data in theoretical biology. For instance, Complex Networks based on Graph theory are used to represent the structure and/or dynamics of different large biological systems such as protein-protein interaction networks. In addition, Randic, Liao, Nandy, Basak, and many others developed some special types of graph-based representations. This special type of graph includes geometrical constrains to node positioning (sequence pseudo-folding rules) in 2D space and adopts final geometrical shapes that resemble lattice-like patterns. Lattice networks have been used to visually depict DNA and protein sequences but they are very flexible. In fact, we can use this technique to create string pseudo-folding lattice representations for any kind of string data. However, despite the proved efficacy of new Lattice-like graph/networks to represent diverse systems, most works focus on only one specific type of biological data. In this work, we review both classic and generalized types of lattice graphs as well as examples that illustrate how to use it in order to represent and compare biological data from different sources. The examples reviewed include the following cases: Protein sequence; Mass Spectra (MS) of protein Peptide Mass Fingerprints (PMF); Molecular Dynamic Trajectory (MDTs) from structural studies; mRNA Microarray data; Single Nucleotide Polymorphisms (SNPs); 1D or 2D-Electrophoresis study of protein Polymorphisms and Protein-research patent and/or copyright information. We used data available from public sources for some examples but for other, we used experimental results reported herein for the first time. This work may break new ground for the application of graph theory in theoretical biology and other areas of biomedical sciences. In addition, we carried out the statistical analysis of 50,000+ cases to seek and validate a new QSAR-like predictor for enzyme sub-classes. The model use as inputs spectral moments of pseudo-folding lattice graphs. Last we illustrated the use of this model to study 4,000+ ESTs of protein sequences present on the parasite Trichinella spiralis.
Keywords: Graph theory, Complex Networks, Proteomics, Mass Spectrometry, Leishmaniosis, 2D Electrophoresis, Parasite population Polymorphism, Single Nucleotide Polymorphism, Schizophrenia, Microarray, Cancer, Patents, &, Copyright studies
Rights & PermissionsPrintExport