Defind: Detecting Genomic Deletions by Integrating Read Depth, GC Content, Mapping Quality and Paired-end Mapping Signatures of Next Generation Sequencing Data

Author(s): Xin Wang*, Huan Zhang, Xiaojing Liu.

Journal Name: Current Bioinformatics

Volume 14 , Issue 2 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: Accurate and exhaustive identification of genomic deletion events is the basis for understanding their roles in phenotype variation. Developing effective algorithms to identify deletions using next generation sequencing (NGS) data remains a challenge.

Objective: The accurate and exhaustive identification of genomic deletion events is important; we present a new approach, Defind, to detect deletions using NGS data from a single sample mapped to the reference genome sequences.

Method: The operating system(s) is Linux. Programming languages are Perl and R. We present Defind, a new approach for detecting medium- and large-sized deletions, based on inspecting the depth of coverage, GC content, mapping quality, and paired-end information of NGS data, simultaneously. We carried out detailed comparisons between Defind and other deletion detection methods using both simulation data and real data.

Results: In simulation studies, Defind could retrieve more deletions than other methods at low to medium sequencing coverage (e.g., 5 to 10×) with no false positives. Using real data, 94% of deletions commonly detected by at least two other methods were also detected by Defind. In addition, 90% of the deletions detected by Defind using the real data were positively supported by comparative genomic hybridization results, demonstrating the efficiency of Defind.

Conclusion: Defind performed robustly at different sequence coverage with different read length in the simulation study. Our studies also provided a significant practical guidance to select appropriate methods to detect genomic deletions using NGS data.

Keywords: Defind, genomic deletions, NGS data, phenotype, algorithms, hybridization.

Meyerson, M.; Gabriel, S.; Getz, G. Advances in understanding cancer genomes through second-generation sequencing. Nat. Rev. Genet., 2010, 11(10), 685-696.
Li, H.; Ruan, J.; Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res., 2008, 18(11), 1851-1858.
Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25(14), 1754-1760.
Lunter, G.; Goodson, M. Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res., 2010, 21(6), 936-939.
Li, H.; Handsaker, B.; Wysoker, A. The Sequence Alignment/Map format and SAMtools. Bioinformatics, 2009, 25(16), 2078-2079.
McLean, C.Y.; Reno, P.L.; Pollen, A.A. Human-specific loss of regulatory DNA and the evolution of human-specific traits. Nature, 2011, 471(7337), 216-219.
Xue, W.; Xing, Y.; Weng, X. Natural variation in Ghd7 is an important regulator of heading date and yield potential in rice. Nat. Genet., 2008, 40(6), 761-767.
Chen, K.; Wallis, J.W.; McLellan, M.D. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods, 2009, 6(9), 677-681.
Ye, K.; Schulz, M.H.; Long, Q.; Apweiler, R.; Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 2009, 25(21), 2865-2871.
Xie, C.; Tammi, M.T. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics, 2009, 10(1), 80.
Alkan, C.; Coe, B.P.; Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet., 2011, 12(5), 363-376.
Abyzov, A.; Urban, A.E.; Snyder, M.; Gerstein, M. CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res., 2011, 21(6), 974-984.
Medvedev, P.; Stanciu, M.; Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods, 2009, 6(Suppl. 11), S13-S20.
Qi, J.; Zhao, F. inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data. Nucleic Acids Res., 2011, 39(Suppl. 2), W567-W75.
Zhang, J.; Wang, J.; Wu, Y. An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC Bioinformatics, 2012, 13(Suppl. 6), S6.
Benjamini, Y.; Speed, T.P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res., 2012, 40(10), e72.
Yoon, S.; Xuan, Z.; Makarov, V.; Ye, K.; Sebat, J. Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res., 2009, 19(9), 1586-1592.
Ouyang, S; Zhu, W; Hamilton, J The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res, 2007 35(Database): D883-D87.
Rausch, T.; Zichner, T.; Schlattl, A.; Stütz, A.M.; Benes, V.; Korbel, J.O. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 2012, 28(18), i333-i9.
Fan, X; Abbott, TE; Larson, D BreakDancer: Identification of Genomic Structural Variation from Paired-End Read Mapping. Curr Protoc Bioinformatics 2014 45: 15.6.1-11.
Xie, W.; Feng, Q.; Yu, H. Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing. Proc. Natl. Acad. Sci. USA, 2010, 107(23), 10578-10583.
Horiuchi, Y.; Harushima, Y.; Fujisawa, H. A simple optimization can improve the performance of single feature polymorphism detection by Affymetrix expression arrays. BMC Genomics, 2010, 11(1), 315.
Wang, L.; Xie, W.; Chen, Y. A dynamic gene expression atlas covering the entire life cycle of rice. Plant J., 2010, 61(5), 752-766.
Kent, W.J. BLAT--the BLAST-like alignment tool. Genome Res., 2002, 12(4), 656-664.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [130 - 138]
Pages: 9
DOI: 10.2174/1574893613666180703110126
Price: $65

Article Metrics

PDF: 42