Computational Approaches to Predict the Non-canonical DNAs

Author(s): Nazia Parveen , Amen Shamim , Seunghee Cho , Kyeong Kyu Kim* .

Journal Name: Current Bioinformatics

Volume 14 , Issue 6 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Although most nucleotides in the genome form canonical double-stranded B-DNA, many repeated sequences transiently present as non-canonical conformations (non-B DNA) such as triplexes, quadruplexes, Z-DNA, cruciforms, and slipped/hairpins. Those noncanonical DNAs (ncDNAs) are not only associated with many genetic events such as replication, transcription, and recombination, but are also related to the genetic instability that results in the predisposition to disease. Due to the crucial roles of ncDNAs in cellular and genetic functions, various computational methods have been implemented to predict sequence motifs that generate ncDNA.

Objective: Here, we review strategies for the identification of ncDNA motifs across the whole genome, which is necessary for further understanding and investigation of the structure and function of ncDNAs.

Conclusion: There is a great demand for computational prediction of non-canonical DNAs that play key functional roles in gene expression and genome biology. In this study, we review the currently available computational methods for predicting the non-canonical DNAs in the genome. Current studies not only provide an insight into the computational methods for predicting the secondary structures of DNA but also increase our understanding of the roles of non-canonical DNA in the genome.

Keywords: Noncanonical DNA, G-quadruplexe, Z-DNA, cruciform, triplexe, hairpin.

[1]
Wells RD, Blakesley RW, Hardies SC, et al. The role of DNA structure in genetic regulation. CRC Crit Rev Biochem 1977; 4(3): 305-40.
[2]
Wells RD, Wartell RM. The influence of nucleotide sequence on DNA properties. Biochemistry of Nucleic Acids 1974; 6(3): 41-64.
[3]
Felsenfeld G, Rich A. Studies on the formation of two- and three-stranded polyribonucleotides. Biochim Biophys Acta 1957; 26(3): 457-68.
[4]
Wang AH, Quigley GJ, Kolpak FJ, et al. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature 1979; 282(5740): 680-6.
[5]
Panayotatos N, Wells RD. Cruciform structures in supercoiled DNA. Nature 1981; 289(5797): 466-70.
[6]
Lyamichev VI, Panyutin IG, Frank-Kamenetskii MD. Evidence of cruciform structures in superhelical DNA provided by two-dimensional gel electrophoresis. FEBS Lett 1983; 153(2): 298-302.
[7]
Sen D, Gilbert W. Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 1988; 334(6180): 364-6.
[8]
Ghosh A, Bansal M. A glossary of DNA structures from A to Z. Acta Crystallogr D Biol Crystallogr 2003; 59(Pt 4): 620-6.
[9]
Zhao J, Bacolla A, Wang G, Vasquez KM. Non-B DNA structure-induced genetic instability and evolution. Cell Mol Life Sci 2010; 67(1): 43-62.
[10]
Hatfield GW, Benham CJ. DNA topology-mediated control of global gene expression in Escherichia coli. Annu Rev Genet 2002; 36: 175-203.
[11]
Rich A, Zhang S. Timeline: Z-DNA: the long road to biological function. Nat Rev Genet 2003; 4(7): 566-72.
[12]
Bacolla A, Wells RD. Non-B DNA conformations, genomic rearrangements, and human disease. J Biol Chem 2004; 279(46): 47411-4.
[13]
Ha SC, Kim D, Hwang HY, Rich A, Kim YG, Kim KK. The crystal structure of the second Z-DNA binding domain of human DAI (ZBP1) in complex with Z-DNA reveals an unusual binding mode to Z-DNA. Proc Natl Acad Sci USA 2008; 105(52): 20671-6.
[14]
Neidle S, Parkinson GN. The structure of telomeric DNA. Curr Opin Struct Biol 2003; 13(3): 275-83.
[15]
Bacolla A, Jaworski A, Larson JE, et al. Breakpoints of gross deletions coincide with non-B DNA conformations. Proc Natl Acad Sci USA 2004; 101(39): 14162-7.
[16]
Wang G, Vasquez KM. Non-B DNA structure-induced genetic instability. Mutat Res 2006; 598(1-2): 103-19.
[17]
Wang G, Vasquez KM. Naturally occurring H-DNA-forming sequences are mutagenic in mammalian cells. Proc Natl Acad Sci USA 2004; 101(37): 13448-53.
[18]
Kornreich R, Bishop DF, Desnick RJ. Alpha-galactosidase A gene rearrangements causing Fabry disease. Identification of short direct repeats at breakpoints in an Alu-rich gene. J Biol Chem 1990; 265(16): 9319-26.
[19]
Bonaglia MC, Giorda R, Massagli A, Galluzzi R, Ciccone R, Zuffardi O. A familial inverted duplication/deletion of 2p25.1-25.3 provides new clues on the genesis of inverted duplications. Eur J Hum Genet 2009; 17(2): 179-86.
[20]
Rooms L, Reyniers E, Kooy RF. Diverse chromosome breakage mechanisms underlie subtelomeric rearrangements, a common cause of mental retardation. Hum Mutat 2007; 28(2): 177-82.
[21]
Quental R, Azevedo L, Rubio V, Diogo L, Amorim A. Molecular mechanisms underlying large genomic deletions in ornithine transcarbamylase (OTC) gene. Clin Genet 2009; 75(5): 457-64.
[22]
Béna F, Gimelli S, Migliavacca E, et al. A recurrent 14q32.2 microdeletion mediated by expanded TGG repeats. Hum Mol Genet 2010; 19(10): 1967-73.
[23]
Repping S, Skaletsky H, Lange J, et al. Recombination between palindromes P5 and P1 on the human Y chromosome causes massive deletions and spermatogenic failure. Am J Hum Genet 2002; 71(4): 906-22.
[24]
Shortt J, Johnstone RW. Oncogenes in cell survival and cell death. Cold Spring Harb Perspect Biol 2012; 4(12)a009829
[25]
Mertens F, Johansson B, Fioretos T, Mitelman F. The emerging complexity of gene fusions in cancer. Nat Rev Cancer 2015; 15(6): 371-81.
[26]
Aparicio T, Baer R, Gautier J. DNA double-strand break repair pathway choice and cancer. DNA Repair (Amst) 2014; 19: 169-75.
[27]
Tsai AG, Lu H, Raghavan SC, Muschen M, Hsieh CL, Lieber MR. Human chromosomal translocations at CpG sites and a theoretical basis for their lineage and stage specificity. Cell 2008; 135(6): 1130-42.
[28]
Xiang H, Wang J, Hisaoka M, Zhu X. Characteristic sequence motifs located at the genomic breakpoints of the translocation t(12;16) and t(12;22) in myxoid liposarcoma. Pathology 2008; 40(6): 547-52.
[29]
Banerji S, Cibulskis K, Rangel-Escareno C, et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 2012; 486(7403): 405-9.
[30]
Lawson AR, Hindley GF, Forshew T, et al. RAF gene fusion breakpoints in pediatric brain tumors are characterized by significant enrichment of sequence microhomology. Genome Res 2011; 21(4): 505-14.
[31]
Dalla-Favera R, Bregni M, Erikson J, Patterson D, Gallo RC, Croce CM. Human c-myc onc gene is located on the region of chromosome 8 that is translocated in Burkitt lymphoma cells. Proc Natl Acad Sci USA 1982; 79(24): 7824-7.
[32]
Neidle S, Parkinson GN. Quadruplex DNA crystal structures and drug design. Biochimie 2008; 90(8): 1184-96.
[33]
Wang AJ, Quigley GJ, Kolpak FJ, van der Marel G, van Boom JH, Rich A. Left-handed double helical DNA: variations in the backbone conformation. Science 1981; 211(4478): 171-6.
[34]
Chandrasekhar S, Naik TR, Nayak SK, Row TN. Crystal structure of an intermolecular 2:1 complex between adenine and thymine. Evidence for both Hoogsteen and ‘quasi-Watson-Crick’ interactions. Bioorg Med Chem Lett 2010; 20(12): 3530-3.
[35]
Patel DJ, Phan AT, Kuryavyi V. Human telomere, oncogenic promoter and 5′-UTR G-quadruplexes: Diverse higher order DNA and RNA targets for cancer therapeutics. Nucleic Acids Res 2007; 35(22): 7429-55.
[36]
Kypr J, Kejnovská I, Renciuk D, Vorlícková M. Circular dichroism and conformational polymorphism of DNA. Nucleic Acids Res 2009; 37(6): 1713-25.
[37]
Mullen MA, Olson KJ, Dallaire P, Major F, Assmann SM, Bevilacqua PC. RNA G-Quadruplexes in the model plant species Arabidopsis thaliana: Prevalence and possible functional roles. Nucleic Acids Res 2010; 38(22): 8149-63.
[38]
Du Z, Zhao Y, Li N. Genome-wide colonization of gene regulatory elements by G4 DNA motifs. Nucleic Acids Res 2009; 37(20): 6784-98.
[39]
Verma A, Halder K, Halder R, et al. Genome-wide computational and expression analyses reveal G-quadruplex DNA motifs as conserved cis-regulatory elements in human and related species. J Med Chem 2008; 51(18): 5641-9.
[40]
Hershman SG, Chen Q, Lee JY, et al. Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae. Nucleic Acids Res 2008; 36(1): 144-56.
[41]
Strawbridge EM, Benson G, Gelfand Y, Benham CJ. The distribution of inverted repeat sequences in the Saccharomyces cerevisiae genome. Curr Genet 2010; 56(4): 321-40.
[42]
Schroth GP, Chou PJ, Ho PS. Mapping Z-DNA in the human genome. Computer-aided mapping reveals a nonrandom distribution of potential Z-DNA-forming sequences in human genes. J Biol Chem 1992; 267(17): 11846-55.
[43]
Angluin D. Finding patterns common to a set of strings. J Comput Syst Sci 1980; 14(1): 46-62.
[44]
Garofalakis M, Rastogi R, Shim K. SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. 2000; 99.
[45]
Hughey R, Krogh A. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci 1996; 12(2): 95-107.
[46]
Kostadinov R, Malhotra N, Viotti M, Shine R, D’Antonio L, Bagga P. GRSDB: a database of quadruplex forming G-rich sequences in alternatively processed mammalian pre-mRNA sequences. Nucleic Acids Res 2006; 34(Database issue): D119-24.
[47]
Dhapola P, Chowdhury S. QuadBase2: web server for multiplexed guanine quadruplex mining and visualization. Nucleic Acids Res 2016; 44(W1)W277-83
[48]
Schroth GP, Ho PS. Occurrence of potential cruciform and H-DNA forming sequences in genomic DNA. Nucleic Acids Res 1995; 23(11): 1977-83.
[49]
Murchie AI, Lilley DM. Supercoiled DNA and cruciform structures. Methods Enzymol 1992; 211: 158-80.
[50]
Zheng GX, Kochel T, Hoepfner RW, Timmons SE, Sinden RR. Torsionally tuned cruciform and Z-DNA probes for measuring unrestrained supercoiling at specific sites in DNA of living cells. J Mol Biol 1991; 221(1): 107-22.
[51]
Lexa M, Nejedlý K, Navrátilová L, Brázdová M. Prediction of significant cruciform structures from sequence in topologically constrained DNA: A probabilistic modelling approach. Bioinformatics 2012; 2012
[http://dx.doi.org/10.5220/0003705701240130]
[52]
Landau GM, Vishkin U, Nussinov R. An efficient string matching algorithm with k differences for nucleotide and amino acid sequences. Nucleic Acids Res 1986; 14(1): 31-46.
[53]
Markham NR, Zuker M. UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol 2008; 453: 3-31.
[54]
Singleton CK, Wells RD. Relationship between superhelical density and cruciform formation in plasmid pVH51. J Biol Chem 1982; 257(11): 6292-5.
[55]
Biertümpfel C, Yang W, Suck D. Crystal structure of T4 endonuclease VII resolving a Holliday junction. Nature 2007; 449(7162): 616-20.
[56]
McNicholas S, Potterton E, Wilson KS, Noble ME. Presenting your structures: The CCP4mg molecular-graphics software. Acta Crystallogr D Biol Crystallogr 2011; 67(Pt 4): 386-94.
[57]
van Dongen MJ, Doreleijers JF, van der Marel GA, van Boom JH, Hilbers CW, Wijmenga SS. Structure and mechanism of formation of the H-y5 isomer of an intramolecular DNA triple helix. Nat Struct Biol 1999; 6(9): 854-9.
[58]
Gal M, Katz T, Ovadia A, Yagil G. TRACTS: A program to map oligopurine.oligopyrimidine and other binary DNA tracts. Nucleic Acids Res 2003; 31(13): 3682-5.
[59]
Gaddis SS, Wu Q, Thames HD, et al. A web-based search engine for triplex-forming oligonucleotide target sequences. Oligonucleotides 2006; 16(2): 196-201.
[60]
Mergny JL, Sun JS, Rougée M, et al. Sequence specificity in triple-helix formation: Experimental and theoretical studies of the effect of mismatches on triplex stability. Biochemistry 1991; 30(40): 9791-8.
[61]
Roberts RW, Crothers DM. Specificity and stringency in DNA triplex formation. Proc Natl Acad Sci USA 1991; 88(21): 9397-401.
[62]
Xodo LE, Alunni-Fabbroni M, Manzini G, Quadrifoglio F. Sequence-specific DNA-triplex formation at imperfect homopurine-homopyrimidine sequences within a DNA plasmid. Eur J Biochem 1993; 212(2): 395-401.
[63]
Jenjaroenpun P, Kuznetsov VA. TTS mapping: Integrative WEB tool for analysis of triplex formation target DNA sequences, G-quadruplets and non-protein coding regulatory DNA elements in the human genome. BMC Genomics 2009; 10(Suppl. 3): S9.
[64]
Zweig AS, Karolchik D, Kuhn RM, Haussler D, Kent WJ. UCSC genome browser tutorial. Genomics 2008; 92(2): 75-84.
[65]
Lexa M, Martínek T, Burgetová I, Kopeček D, Brázdová M. A dynamic programming algorithm for identification of triplex-forming sequences. Bioinformatics 2011; 27(18): 2510-7.
[66]
Wang Y, Patel DJ. Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex. Structure 1993; 1(4): 263-82.
[67]
Todd AK, Johnston M, Neidle S. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res 2005; 33(9): 2901-7.
[68]
Huppert JL, Balasubramanian S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res 2005; 33(9): 2908-16.
[69]
Rawal P, Kummarasetti VB, Ravindran J, et al. Genome-wide prediction of G4 DNA as regulatory motifs: Role in Escherichia coli global regulation. Genome Res 2006; 16(5): 644-55.
[70]
Huppert JL, Balasubramanian S. G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res 2007; 35(2): 406-13.
[71]
Cao K, Ryvkin P, Johnson FB. Computational detection and analysis of sequences with duplex-derived interstrand G-quadruplex forming potential. Methods 2012; 57(1): 3-10.
[72]
Eddy J, Maizels N. Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res 2006; 34(14): 3887-96.
[73]
Kikin O, D'Antonio L, Bagga PS. QGRS Mapper: A web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Res 2006; 34(Web Server issue): W676-682.
[74]
Beaudoin JD, Perreault JP. 5′-UTR G-quadruplex structures acting as translational repressors. Nucleic Acids Res 2010; 38(20): 7022-36.
[75]
Lorenz R, Hofacker IL, Bernhart SH. Folding RNA/DNA hybrid duplexes. Bioinformatics 2012; 28(19): 2530-1.
[76]
Yano M, Kato Y. Using hidden Markov models to investigate G-quadruplex motifs in genomic sequences. BMC Genomics 2014; 15(Suppl. 9): S15.
[77]
Stegle O, Payet L, Mergny JL, MacKay DJ, Leon JH. Predicting and understanding the stability of G-quadruplexes. Bioinformatics 2009; 25(12): i374-82.
[78]
Mukundan VT, Phan AT. Bulges in G-quadruplexes: Broadening the definition of G-quadruplex-forming sequences. J Am Chem Soc 2013; 135(13): 5017-28.
[79]
Varizhuk A, Ischenko D, Tsvetkov V, et al. The expanding repertoire of G4 DNA structures. Biochimie 2017; 135: 54-62.
[80]
Bedrat A, Lacroix L, Mergny JL. Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res 2016; 44(4): 1746-59.
[81]
Varizhuk A, Ischenko D, Smirnov I, et al. Galiana: An improved search algorithm to find G-quadruplexes in genome sequences. bioRxiv 2014; 135.
[82]
Hon J, Martínek T, Zendulka J, Lexa M. pqsfinder: An exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 2017; 33(21): 3373-9.
[83]
Ha SC, Lowenhaupt K, Rich A, Kim YG, Kim KK. Crystal structure of a junction between B-DNA and Z-DNA reveals two extruded bases. Nature 2005; 437(7062): 1183-6.
[84]
Rahmouni AR, Wells RD. Stabilization of Z DNA in vivo by localized supercoiling. Science 1989; 246(4928): 358-63.
[85]
Kim D, Lee YH, Hwang HY, Kim KK, Park HJ. Z-DNA binding proteins as targets for structure-based virtual screening. Curr Drug Targets 2010; 11(3): 335-44.
[86]
Schwartz T, Rould MA, Lowenhaupt K, Herbert A, Rich A. Crystal structure of the Zalpha domain of the human editing enzyme ADAR1 bound to left-handed Z-DNA. Science 1999; 284(5421): 1841-5.
[87]
Ha SC, Lokanath NK, Van Quyen D, et al. A poxvirus protein forms a complex with left-handed Z-DNA: Crystal structure of a Yatapoxvirus Zalpha bound to DNA. Proc Natl Acad Sci USA 2004; 101(40): 14367-72.
[88]
Pham HT, Park MY, Kim KK, Kim YG, Ahn JH. Intracellular localization of human ZBP1: Differential regulation by the Z-DNA binding domain, Zalpha, in splice variants. Biochem Biophys Res Commun 2006; 348(1): 145-52.
[89]
Kim D, Hur J, Park K, et al. Distinct Z-DNA binding mode of a PKR-like protein kinase containing a Z-DNA binding domain (PKZ). Nucleic Acids Res 2014; 42(9): 5937-48.
[90]
Ho PS, Ellison MJ, Quigley GJ, Rich A. A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences. EMBO J 1986; 5(10): 2737-44.
[91]
Peck LJ, Wang JC. Energetics of B-to-Z transition in DNA. Proc Natl Acad Sci USA 1983; 80(20): 6206-10.
[92]
Ellison MJ, Kelleher RJ III, Wang AH, Habener JF, Rich A. Sequence-dependent energetics of the B-Z transition in supercoiled DNA containing nonalternating purine-pyrimidine sequences. Proc Natl Acad Sci USA 1985; 82(24): 8320-4.
[93]
Ellison MJ, Feigon J, Kelleher RJ III, Wang AH, Habener JF, Rich A. An assessment of the Z-DNA forming potential of alternating dA-dT stretches in supercoiled plasmids. Biochemistry 1986; 25(12): 3648-55.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 14
ISSUE: 6
Year: 2019
Page: [470 - 479]
Pages: 10
DOI: 10.2174/1574893614666190126143438
Price: $58

Article Metrics

PDF: 25
HTML: 2