Current Bioinformatics

Yi-Ping Phoebe Chen
Department of Computer Science and Information Technology
La Trobe University
Melbourne
Australia

Back

Metagenome Assembly Validation: Which Metagenome Contigs are Bona Fide?

Author(s): Yan Ji, Yi-Xue Li, Yu-Dong Cai, Kuo-Chen Chou.

Abstract:

In the metagenomics, long metagenome contigs can either improve metagenome gene prediction or metagenome sequence binning. Moreover, metagenome contigs can also make gene function annotation more accurate because they provide a lot of genome context information. Because of repetitive sequences of either intra-genomes or inter-genomes, metagenome contigs are probably wrongly assembled. Therefore, it is essential to develop a method to validate metagenome contigs. Here, we propose a computational method to validate metagenome contigs. After realigning raw sequencing reads onto one contig, we first compute a contig-ECDF (empirical cumulative probability distribution functions) and its corresponding reference using a computational simulation-based method. Because a reference of the contig-ECDF is changeless given some parameters, we use the distinction between them to check whether or not a contig is bona fide. The less the distinction is, the more likely a contig is bona fide. For wrongly assembled metagenome contigs, using simulated metagenome datasets, our method was shown to have a good capacity to identify them. After applying the method to a real metagenome dataset, which was sequenced from an in vitro-simulated microbial community with known constituted genomes, we showed that our method had a strong ability to identify bona fide contigs, and further demonstrated that small distinctions between contig-ECDFs and their references were significantly correlated with bona fide contigs. A computational method is developed to validate metagenome contigs. For each metagenome contig, our method gives it a score, and the smaller the score is, the more likely a contig is bona fide. After validation using both simulated and real datasets, our method was shown to have good performances.

Keywords: Bona fide contigs , computational method, datasets, metagenome contigs, Metagenomics, simulated metagenome.

Order Reprints Order Eprints Rights & PermissionsPrintExport

Article Details

VOLUME: 8
ISSUE: 4
Year: 2013
Page: [511 - 523]
Pages: 13
DOI: 10.2174/1574893611308040013
Price: $58