Generic placeholder image

Current Genomics

Editor-in-Chief

ISSN (Print): 1389-2029
ISSN (Online): 1875-5488

Research Article

GAAP: A GUI-based Genome Assembly and Annotation Package

Author(s): Deepak Singla* and Inderjit Singh Yadav

Volume 23, Issue 2, 2022

Published on: 18 March, 2022

Page: [77 - 82] Pages: 6

DOI: 10.2174/1389202923666220128155537

Price: $65

Abstract

Background: Next-generation sequencing (NGS) technologies are being continuously used for high-throughput sequencing data generation that requires easy-to-use GUI-based data analysis software. These kinds of software could be used in-parallel with sequencing for the automatic data analysis. At present, very few software are available for use and most of them are commercial, thus creating a gap between data generation and data analysis.

Methods: GAAP is developed on the NodeJS platform that uses HTML, JavaScript as the front- end for communication with users. We have implemented FastQC and trimmomatic tool for quality checking and control. Velvet and Prodigal are integrated for genome assembly and gene prediction. The annotation will be done with the help of remote NCBI Blast and IPR-Scan. In the backend, we have used PERL and JavaScript for the processing of data. To evaluate the performance of GAAP, we have assembled a viral (SRR11621811), bacterial (SRR17153353) and human genome (SRR16845439).

Results: We have used GAAP software to assemble, and annotate a COVID-19 genome on a desktop computer that resulted in a single contig of 27994bp with 99.57% reference genome coverage. This assembly predicted 11 genes, of which 10 were annotated using annotation module of GAAP. We have also assembled a bacterial and human genome 138 and 194281 contigs with N50 value 100399 and 610, respectively.

Conclusion: In this study, we have developed freely available, platform-independent genome assembly and annotation (GAAP) software (www.deepaklab.com/gaap). The software itself acts as a complete data analysis package with quality check, quality control, de-novo genome assembly, gene prediction and annotation (Blast, PFAM, GO-Term, pathway and enzyme mapping) modules.

Keywords: NGS, software, GUI, genome assembly, gene prediction, annotation.

Graphical Abstract
[1]
Tripathi, R.; Sharma, P.; Chakraborty, P.; Pritish, ; Varadwaj, K. Next-generation sequencing revolution through big data analytics. Front. Life Sci., 2016, 9(2), 119-149.
[http://dx.doi.org/10.1080/21553769.2016.1178180]
[2]
Giani, A.M.; Gallo, G.R.; Gianfranceschi, L.; Formenti, G. Long walk to genomics: History and current approaches to genome sequencing and assembly. Comput. Struct. Biotechnol. J., 2019, 18, 9-19.
[http://dx.doi.org/10.1016/j.csbj.2019.11.002] [PMID: 31890139]
[3]
Kodama, Y.; Shumway, M.; Leinonen, R. International Nucleotide Sequence Database Collaboration. The sequence read archive: Explosive growth of sequencing data. Nucleic Acids Res., 2012, 40, D54-D56.
[http://dx.doi.org/10.1093/nar/gkr854] [PMID: 22009675]
[4]
Tao, Y.; Zhao, X.; Mace, E.; Henry, R.; Jordan, D. Exploring and exploiting pan-genomics for crop improvement. Mol. Plant, 2019, 12(2), 156-169.
[http://dx.doi.org/10.1016/j.molp.2018.12.016] [PMID: 30594655]
[5]
Rouli, L.; Merhej, V.; Fournier, P.E.; Raoult, D. The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microbes New Infect., 2015, 7, 72-85.
[http://dx.doi.org/10.1016/j.nmni.2015.06.005] [PMID: 26442149]
[6]
Bayer, P.E.; Golicz, A.A.; Scheben, A.; Batley, J.; Edwards, D. Plant pan-genomes are the new reference. Nat. Plants, 2020, 6(8), 914-920.
[http://dx.doi.org/10.1038/s41477-020-0733-0] [PMID: 32690893]
[7]
Esposito, A.; Colantuono, C.; Ruggieri, V.; Chiusano, M.L. Bioinformatics for agriculture in the next-generation sequencing era. Chem. Biol. Technol. Agric., 2016, 3, 1-12.
[http://dx.doi.org/10.1186/s40538-016-0054-8]
[8]
Roumpeka, D.D.; Wallace, R.J.; Escalettes, F.; Fotheringham, I.; Watson, M. A review of bioinformatics tools for bio-prospecting from metagenomic sequence data. Front. Genet., 2017, 8, 23.
[http://dx.doi.org/10.3389/fgene.2017.00023] [PMID: 28321234]
[9]
Conesa, A.; Götz, S.; García-Gómez, J.M.; Terol, J.; Talón, M.; Robles, M. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 2005, 21(18), 3674-3676.
[http://dx.doi.org/10.1093/bioinformatics/bti610] [PMID: 16081474]
[10]
Conesa, A.; Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int. J. Plant Genomics, 2008, 2008, 619832.
[http://dx.doi.org/10.1155/2008/619832] [PMID: 18483572]
[11]
Powell, D.R.; Seemann, T. VAGUE: A graphical user interface for the Velvet assembler. Bioinformatics, 2013, 29(2), 264-265.
[http://dx.doi.org/10.1093/bioinformatics/bts664] [PMID: 23162059]
[12]
Leinonen, R.; Sugawara, H.; Shumway, M. International nucleotide sequence database collaboration. The sequence read archive. Nucleic Acids Res., 2011, 39, D19-D21.
[http://dx.doi.org/10.1093/nar/gkq1019] [PMID: 21062823]
[13]
Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15), 2114-2120.
[http://dx.doi.org/10.1093/bioinformatics/btu170] [PMID: 24695404]
[14]
Zerbino, D.R.; Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 2008, 18(5), 821-829.
[http://dx.doi.org/10.1101/gr.074492.107] [PMID: 18349386]
[15]
Hyatt, D.; Chen, G.L.; Locascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 2010, 11(1), 119.
[http://dx.doi.org/10.1186/1471-2105-11-119] [PMID: 20211023]
[16]
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol., 1990, 215(3), 403-410.
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712]
[17]
Finn, R.D.; Attwood, T.K.; Babbitt, P.C.; Bateman, A.; Bork, P.; Bridge, A.J.; Chang, H-Y.; Dosztányi, Z.; El-Gebali, S.; Fraser, M.; Gough, J.; Haft, D.; Holliday, G.L.; Huang, H.; Huang, X.; Letunic, I.; Lopez, R.; Lu, S.; Marchler-Bauer, A.; Mi, H.; Mistry, J.; Natale, D.A.; Necci, M.; Nuka, G.; Orengo, C.A.; Park, Y.; Pesseat, S.; Piovesan, D.; Potter, S.C.; Rawlings, N.D.; Redaschi, N.; Richardson, L.; Rivoire, C.; Sangrador-Vegas, A.; Sigrist, C.; Sillitoe, I.; Smithers, B.; Squizzato, S.; Sutton, G.; Thanki, N.; Thomas, P.D.; Tosatto, S.C.E.; Wu, C.H.; Xenarios, I.; Yeh, L-S.; Young, S-Y.; Mitchell, A.L. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res., 2017, 45(D1), D190-D199.
[http://dx.doi.org/10.1093/nar/gkw1107] [PMID: 27899635]
[18]
Jones, P.; Binns, D.; Chang, H-Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; Pesseat, S.; Quinn, A.F.; Sangrador-Vegas, A.; Scheremetjew, M.; Yong, S-Y.; Lopez, R.; Hunter, S.; Valencia, A. InterProScan 5: genome-scale protein function classification. Bioinformatics, 2014, 30(9), 1236-1240.
[http://dx.doi.org/10.1093/bioinformatics/btu031] [PMID: 24451626]
[19]
Kanehisa, M.; Goto, S.; Sato, Y.; Furumichi, M.; Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res., 2012, 40, D109-D114.
[http://dx.doi.org/10.1093/nar/gkr988] [PMID: 22080510]
[20]
Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res., 2016, 44(D1), D457-D462.
[http://dx.doi.org/10.1093/nar/gkv1070] [PMID: 26476454]
[21]
The Gene Ontology Consortium. The gene ontology resource: 20 Years and still going strong. Nucleic Acids Res., 2019, 47(D1), D330-D338.
[http://dx.doi.org/10.1093/nar/gky1055] [PMID: 30395331]
[22]
Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; Harris, M.A.; Hill, D.P.; Issel-Tarver, L.; Kasarskis, A.; Lewis, S.; Matese, J.C.; Richardson, J.E.; Ringwald, M.; Rubin, G.M.; Sherlock, G. The Gene Ontology Consortium. Gene ontology: Tool for the unification of biology. Nat. Genet., 2000, 25(1), 25-29.
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[23]
Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 2013, 29(8), 1072-1075.
[http://dx.doi.org/10.1093/bioinformatics/btt086] [PMID: 23422339]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy