Background: Many forms of variations exist in the genome, which are the main causes
of individual phenotypic differences. The detection of variants, especially those located in the
tumor genome, still faces many challenges due to the complexity of the genome structure. Thus,
the performance assessment of variation detection tools using next-generation sequencing
platforms is urgently needed.
Method: We have created a software package called the Multi-Variation Simulator of Cancer
genomes (MVSC) to simulate common genomic variants, including single nucleotide
polymorphisms, small insertion and deletion polymorphisms, and structural variations (SVs), which
are analogous to human somatically acquired variations. Three sets of variations embedded in
genomic sequences in different periods were dynamically and sequentially simulated one by one.
Results: In cancer genome simulation, complex SVs are important because this type of variation is
characteristic of the tumor genome structure. Overlapping variations of different sizes can also
coexist in the same genome regions, adding to the complexity of cancer genome architecture. Our
results show that MVSC can efficiently simulate a variety of genomic variants that cannot be
simulated by existing software packages.
Conclusion: The MVSC-simulated variants can be used to assess the performance of existing tools
designed to detect SVs in next-generation sequencing data, and we also find that MVSC is memory
and time-efficient compared with similar software packages.