Generic placeholder image

Current Bioinformatics


ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Metagenome Assembly Validation: Which Metagenome Contigs are Bona Fide?

Author(s): Yan Ji, Yi-Xue Li, Yu-Dong Cai and Kuo-Chen Chou

Volume 8 , Issue 4 , 2013

Page: [511 - 523] Pages: 13

DOI: 10.2174/1574893611308040013

Price: $65


In the metagenomics, long metagenome contigs can either improve metagenome gene prediction or metagenome sequence binning. Moreover, metagenome contigs can also make gene function annotation more accurate because they provide a lot of genome context information. Because of repetitive sequences of either intra-genomes or inter-genomes, metagenome contigs are probably wrongly assembled. Therefore, it is essential to develop a method to validate metagenome contigs. Here, we propose a computational method to validate metagenome contigs. After realigning raw sequencing reads onto one contig, we first compute a contig-ECDF (empirical cumulative probability distribution functions) and its corresponding reference using a computational simulation-based method. Because a reference of the contig-ECDF is changeless given some parameters, we use the distinction between them to check whether or not a contig is bona fide. The less the distinction is, the more likely a contig is bona fide. For wrongly assembled metagenome contigs, using simulated metagenome datasets, our method was shown to have a good capacity to identify them. After applying the method to a real metagenome dataset, which was sequenced from an in vitro-simulated microbial community with known constituted genomes, we showed that our method had a strong ability to identify bona fide contigs, and further demonstrated that small distinctions between contig-ECDFs and their references were significantly correlated with bona fide contigs. A computational method is developed to validate metagenome contigs. For each metagenome contig, our method gives it a score, and the smaller the score is, the more likely a contig is bona fide. After validation using both simulated and real datasets, our method was shown to have good performances.

Keywords: Bona fide contigs , computational method, datasets, metagenome contigs, Metagenomics, simulated metagenome.

Rights & Permissions Print Export Cite as
© 2022 Bentham Science Publishers | Privacy Policy