Annotating Viral Genomes - A Cannon is Needed to Kill Mosquitoes
Affiliation: Bioinformatics Department, J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
The majority of viruses have a small genome. However, these small genomes often have complex gene features
with transcriptional and translational exceptions, for instance, gene overlapping, alternative splicing, RNA editing,
ribosomal slippage and stop codon read-through. These complex features and exceptions increase gene density and
improve the gene coding efficiency of viral genomes. They also pose immense challenges to gene prediction algorithms.
Most gene prediction programs for eukaryotic and prokaryotic genomes cannot detect or predict these exceptions
correctly. It is critical to predict these complex features and exceptions with high precision and accuracy in order to
interpret viral genomic data correctly.
This paper describes the most commonly used programs for viral gene predictions, focusing on the ab initio and
similarity-based gene prediction programs, including GeneMarkS, ZCURVE_V, FgenesV, Phylo-HMM, MLOGD,
GATU, VirGen, FLAN, VIGOR and others. Viral genome complex features and the basic algorithms of the gene
prediction programs are introduced briefly, with identification of advantages and disadvantages, followed by a list of
application scopes and specific features. Gene prediction programs for bacteriophages and viral meta-genomic sequences
are reviewed separately. The last section of this review presents the future directions and challenges for viral gene
prediction program development.
Keywords: Mature peptide prediction, viral gene complex feature, viral gene prediction, viral genome annotation, VIGOR.
Rights & PermissionsPrintExport