Genome Wide Approaches to Identify Protein-DNA Interactions

Author(s): Tao Ma, Zhenqing Ye, Liguo Wang*.

Journal Name: Current Medicinal Chemistry

Volume 26 , Issue 42 , 2019

  Journal Home
Translate in Chinese

Abstract:

Background: Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites.

Objective: This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools.

Conclusion: ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome- wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux.

Keywords: Transcription factor, transcription factor binding site, chromatin immunoprecipitation, microarrays, next-generation sequencing, ChIP-chip, ChIP-seq, ChIP-exo, data analysis.

[1]
Consortium, E.P. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 2012, 489(7414), 57-74.
[http://dx.doi.org/10.1038/nature11247] [PMID: 22955616]
[2]
Celniker, S.E.; Dillon, L.A.; Gerstein, M.B.; Gunsalus, K.C.; Henikoff, S.; Karpen, G.H.; Kellis, M.; Lai, E.C.; Lieb, J.D.; MacAlpine, D.M.; Micklem, G.; Piano, F.; Snyder, M.; Stein, L.; White, K.P.; Waterston, R.H. modENCODE Consortium. Unlocking the secrets of the genome. Nature, 2009, 459(7249), 927-930.
[http://dx.doi.org/10.1038/459927a] [PMID: 19536255]
[3]
Bernstein, B.E.; Stamatoyannopoulos, J.A.; Costello, J.F.; Ren, B.; Milosavljevic, A.; Meissner, A.; Kellis, M.; Marra, M.A.; Beaudet, A.L.; Ecker, J.R.; Farnham, P.J.; Hirst, M.; Lander, E.S.; Mikkelsen, T.S.; Thomson, J.A. The NIH roadmap epigenomics mapping consortium. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol., 2010, 28(10), 1045-1048.
[http://dx.doi.org/10.1038/nbt1010-1045] [PMID: 20944595]
[4]
Ren, B.; Robert, F.; Wyrick, J.J.; Aparicio, O.; Jennings, E.G.; Simon, I.; Zeitlinger, J.; Schreiber, J.; Hannett, N.; Kanin, E.; Volkert, T.L.; Wilson, C.J.; Bell, S.P.; Young, R.A. Genome-wide location and function of DNA binding proteins. Science, 2000, 290(5500), 2306-2309.
[http://dx.doi.org/10.1126/science.290.5500.2306] [PMID: 11125145]
[5]
Johnson, D.S.; Mortazavi, A.; Myers, R.M.; Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science, 2007, 316(5830), 1497-1502.
[http://dx.doi.org/10.1126/science.1141319] [PMID: 17540862]
[6]
Robertson, G.; Hirst, M.; Bainbridge, M.; Bilenky, M.; Zhao, Y.; Zeng, T.; Euskirchen, G.; Bernier, B.; Varhol, R.; Delaney, A.; Thiessen, N.; Griffith, O.L.; He, A.; Marra, M.; Snyder, M.; Jones, S. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods, 2007, 4(8), 651-657.
[http://dx.doi.org/10.1038/nmeth1068] [PMID: 17558387]
[7]
Retraction Note to. Retraction Note to: ChIP-seq analysis of androgen receptor in LNCaP cell line. Mol. Biol. Rep., 2015, 42(10), 1479.
[http://dx.doi.org/10.1007/s11033-015-3903-9] [PMID: 26285940]
[8]
Rhee, H.S.; Pugh, B.F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell, 2011, 147(6), 1408-1419.
[http://dx.doi.org/10.1016/j.cell.2011.11.013] [PMID: 22153082]
[9]
Adriaens, M.E.; Prickaerts, P.; Chan-Seng-Yue, M.; van den Beucken, T.; Dahlmans, V.E.H.; Eijssen, L.M.; Beck, T.; Wouters, B.G.; Voncken, J.W.; Evelo, C.T.A. Quantitative analysis of ChIP-seq data uncovers dynamic and sustained H3K4me3 and H3K27me3 modulation in cancer cells under hypoxia. Epigenetics Chromatin, 2016, 9, 48.
[http://dx.doi.org/10.1186/s13072-016-0090-4] [PMID: 27822313]
[10]
Adli, M.; Bernstein, B.E. Whole-genome chromatin profiling from limited numbers of cells using nano-ChIP-seq. Nat. Protoc., 2011, 6(10), 1656-1668.
[http://dx.doi.org/10.1038/nprot.2011.402] [PMID: 21959244]
[11]
Aghamirzaie, D.; Raja Velmurugan, K.; Wu, S.; Altarawy, D.; Heath, L.S.; Grene, R. Expresso: A database and web server for exploring the interaction of transcription factors and their target genes in Arabidopsis thaliana using ChIP-Seq peak data. F1000 Res., 2017, 6, 372.
[http://dx.doi.org/10.12688/f1000research.10041.1] [PMID: 28529706]
[12]
Nelson, J.D.; Denisenko, O.; Bomsztyk, K. Protocol for the fast chromatin immunoprecipitation (ChIP) method. Nat. Protoc., 2006, 1(1), 179-185.
[http://dx.doi.org/10.1038/nprot.2006.27] [PMID: 17406230]
[13]
Buck, M.J.; Lieb, J.D. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics, 2004, 83(3), 349-360.
[http://dx.doi.org/10.1016/j.ygeno.2003.11.004] [PMID: 14986705]
[14]
Liu, X.S. Getting started in tiling microarray analysis. PLOS Comput. Biol., 2007, 3(10), 1842-1844.
[http://dx.doi.org/10.1371/journal.pcbi.0030183] [PMID: 17967045]
[15]
Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet., 2009, 10(10), 669-680.
[http://dx.doi.org/10.1038/nrg2641] [PMID: 19736561]
[16]
HiSeq 3000/HiSeq 4000 System quality and performance. Available at: https://www.illumina.com/systems/sequencing-platforms/hiseq-3000-4000/specifications.html (Accessed Date: 14 Nov, 2017)
[17]
Ladunga, I. Computational biology of transcription factor binding; Humana Press: New York, NY, 2010, p. xi.
[http://dx.doi.org/10.1007/978-1-60761-854-6]
[18]
Teng, M.; Irizarry, R.A. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res., 2017, 27(11), 1930-1938.
[http://dx.doi.org/10.1101/gr.220673.117] [PMID: 29025895]
[19]
Landt, S.G.; Marinov, G.K.; Kundaje, A.; Kheradpour, P.; Pauli, F.; Batzoglou, S.; Bernstein, B.E.; Bickel, P.; Brown, J.B.; Cayting, P.; Chen, Y.; DeSalvo, G.; Epstein, C.; Fisher-Aylor, K.I.; Euskirchen, G.; Gerstein, M.; Gertz, J.; Hartemink, A.J.; Hoffman, M.M.; Iyer, V.R.; Jung, Y.L.; Karmakar, S.; Kellis, M.; Kharchenko, P.V.; Li, Q.; Liu, T.; Liu, X.S.; Ma, L.; Milosavljevic, A.; Myers, R.M.; Park, P.J.; Pazin, M.J.; Perry, M.D.; Raha, D.; Reddy, T.E.; Rozowsky, J.; Shoresh, N.; Sidow, A.; Slattery, M.; Stamatoyannopoulos, J.A.; Tolstorukov, M.Y.; White, K.P.; Xi, S.; Farnham, P.J.; Lieb, J.D.; Wold, B.J.; Snyder, M. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res., 2012, 22(9), 1813-1831.
[http://dx.doi.org/10.1101/gr.136184.111] [PMID: 22955991]
[20]
Song, J.S.; Maghsoudi, K.; Li, W.; Fox, E.; Quackenbush, J.; Shirley Liu, X. Microarray blob-defect removal improves array analysis. Bioinformatics, 2007, 23(8), 966-971.
[http://dx.doi.org/10.1093/bioinformatics/btm043] [PMID: 17332024]
[21]
Ji, H. Computational analysis of ChIP-chip data in: Handbook of Statistical Bioinformatics; ; Lu, H.H-S.; Schölkopf, B.; Zhao, H., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2011, pp. 257-282.
[http://dx.doi.org/10.1007/978-3-642-16345-6_12]
[22]
Cawley, S.; Bekiranov, S.; Ng, H.H.; Kapranov, P.; Sekinger, E.A.; Kampa, D.; Piccolboni, A.; Sementchenko, V.; Cheng, J.; Williams, A.J.; Wheeler, R.; Wong, B.; Drenkow, J.; Yamanaka, M.; Patel, S.; Brubaker, S.; Tammana, H.; Helt, G.; Struhl, K.; Gingeras, T.R. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 2004, 116(4), 499-509.
[http://dx.doi.org/10.1016/S0092-8674(04)00127-8] [PMID: 14980218]
[23]
Johnson, W.E.; Li, W.; Meyer, C.A.; Gottardo, R.; Carroll, J.S.; Brown, M.; Liu, X.S. Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA, 2006, 103(33), 12457-12462.
[http://dx.doi.org/10.1073/pnas.0601180103] [PMID: 16895995]
[24]
Ji, H.; Jiang, H.; Ma, W.; Johnson, D.S.; Myers, R.M.; Wong, W.H. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol., 2008, 26(11), 1293-1300.
[http://dx.doi.org/10.1038/nbt.1505] [PMID: 18978777]
[25]
Ji, H.; Wong, W.H. TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics, 2005, 21(18), 3629-3636.
[http://dx.doi.org/10.1093/bioinformatics/bti593] [PMID: 16046496]
[26]
Bailey, T.; Krajewski, P.; Ladunga, I.; Lefebvre, C.; Li, Q.; Liu, T.; Madrigal, P.; Taslim, C.; Zhang, J. Practical guidelines for the comprehensive analysis of ChIP-seq data. PLOS Comput. Biol., 2013, 9(11)e1003326
[http://dx.doi.org/10.1371/journal.pcbi.1003326] [PMID: 24244136]
[27]
Andrews, S. FastQC: a quality control tool for high throughput sequence data., 2010.
[28]
Martin, M. Cutadapt removes adapter sequences from highthroughput sequencing reads. EMBnet.journal, 2011, 17(1)
[http://dx.doi.org/10.14806/ej.17.1.200]
[29]
Joshi, N.A.F.J. (2011) Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software]; Available at https://github.com/najoshi/sickle
[30]
Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15), 2114-2120.
[http://dx.doi.org/10.1093/bioinformatics/btu170] [PMID: 24695404]
[31]
Del Fabbro, C.; Scalabrin, S.; Morgante, M.; Giorgi, F.M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS One, 2013, 8(12) e85024
[http://dx.doi.org/10.1371/journal.pone.0085024] [PMID: 24376861]
[32]
Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25(14), 1754-1760.
[http://dx.doi.org/10.1093/bioinformatics/btp324] [PMID: 19451168]
[33]
Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods, 2012, 9(4), 357-359.
[http://dx.doi.org/10.1038/nmeth.1923] [PMID: 22388286]
[34]
Li, H.; Homer, N. A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform., 2010, 11(5), 473-483.
[http://dx.doi.org/10.1093/bib/bbq015] [PMID: 20460430]
[35]
Treangen, T.J.; Salzberg, S.L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet., 2011, 13(1), 36-46.
[http://dx.doi.org/10.1038/nrg3117] [PMID: 22124482]
[36]
Nakato, R.; Shirahige, K. Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation. Brief. Bioinform., 2017, 18(2), 279-290.
[PMID: 26979602]
[37]
Broadinstitute Picard, Available at:. http://broadinstitute.github.io/picard/ (Accessed on November 23, 2017)
[38]
Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R. 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009, 25(16), 2078-2079.
[http://dx.doi.org/10.1093/bioinformatics/btp352] [PMID: 19505943]
[39]
Kharchenko, P.V.; Tolstorukov, M.Y.; Park, P.J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol., 2008, 26(12), 1351-1359.
[http://dx.doi.org/10.1038/nbt.1508] [PMID: 19029915]
[40]
Pepke, S.; Wold, B.; Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods, 2009, 6(Suppl. 11), S22-S32.
[http://dx.doi.org/10.1038/nmeth.1371] [PMID: 19844228]
[41]
Zhang, Y.; Liu, T.; Meyer, C.A.; Eeckhoute, J.; Johnson, D.S.; Bernstein, B.E.; Nusbaum, C.; Myers, R.M.; Brown, M.; Li, W.; Liu, X.S. Model-based analysis of ChIP-Seq (MACS). Genome Biol., 2008, 9(9), R137.
[http://dx.doi.org/10.1186/gb-2008-9-9-r137] [PMID: 18798982]
[42]
Rozowsky, J.; Euskirchen, G.; Auerbach, R.K.; Zhang, Z.D.; Gibson, T.; Bjornson, R.; Carriero, N.; Snyder, M.; Gerstein, M.B. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol., 2009, 27(1), 66-75.
[http://dx.doi.org/10.1038/nbt.1518] [PMID: 19122651]
[43]
Valouev, A.; Johnson, D.S.; Sundquist, A.; Medina, C.; Anton, E.; Batzoglou, S.; Myers, R.M.; Sidow, A. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods, 2008, 5(9), 829-834.
[http://dx.doi.org/10.1038/nmeth.1246] [PMID: 19160518]
[44]
Li, Q.H.; Brown, J.B.; Huang, H.Y.; Bickel, P.J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat., 2011, 5(3), 1752-1779.
[http://dx.doi.org/10.1214/11-AOAS466]
[45]
ENCODE (2012) Irreproducible Discovery Rate (IDR), (Version 0.11.5) [Software]; Available at . https://www.encodeproject.org/software/idr/
[46]
Li, Q. (2014) IDR: Irreproducible Discovery Rate, (Version 1.2) [Software]; Available at: https://CRAN.R-project.org/package=idr.
[47]
Wang, L.; Chen, J.; Wang, C.; Uuskula-Reimand, L.; Chen, K.; Medina-Rivera, A.; Young, E.J.; Zimmermann, M.T.; Yan, H.; Sun, Z.; Zhang, Y.; Wu, S.T.; Huang, H.; Wilson, M.D.; Kocher, J.P.; Li, W. MACE: model based analysis of ChIP-exo. Nucleic Acids Res., 2014, 42(20)e156
[http://dx.doi.org/10.1093/nar/gku846] [PMID: 25249628]
[48]
Guo, Y.; Mahony, S.; Gifford, D.K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLOS Comput. Biol., 2012, 8(8)e1002638
[http://dx.doi.org/10.1371/journal.pcbi.1002638] [PMID: 22912568]


Rights & PermissionsPrintExport Cite as


Article Details

VOLUME: 26
ISSUE: 42
Year: 2019
Page: [7641 - 7654]
Pages: 14
DOI: 10.2174/0929867325666180530115711
Price: $65

Article Metrics

PDF: 28
HTML: 4