Journalartikel

ReferenceSeeker: rapid determination of appropriate reference genomes


AutorenlisteSchwengers, O; Hain, T; Chakraborty, T; Goesmann, A

Jahr der Veröffentlichung2020

ZeitschriftThe Journal of Open Source Software

Bandnummer5

Heftnummer46

DOI Linkhttps://doi.org/10.21105/joss.01994

VerlagOpen Journals


Abstract

The enormous success and ubiquitous application of next and third generation sequencing has led to a large number of available high-quality draft and complete microbial genomes in the public databases. Today, the NCBI RefSeq database release 90 alone contains 11,060 complete bacterial genomes (Haft et al., 2018 ). Concurrently, selection of appropriate reference genomes (RGs) is increasingly important as it has enormous implications for routine in-silico analyses, as for example in detection of single nucleotide polymorphisms, scaffolding of draft assemblies, comparative genomics and metagenomic tasks. Therefore, a rigorously selected RG is a prerequisite for the accurate and successful application of the aforementioned bioinformatic analyses. In order to address this issue several new databases, methods and tools have been published in recent years e.g. RefSeq, DNA-DNA hybridization (Meier-Kolthoff, Auch, Klenk, & Göker, 2013), average nucleotide identity (ANI) as well as percentage of conserved DNA (conDNA) values (Goris et al., 2007) and Mash (Ondov et al., 2016). Nevertheless, the sheer amount of currently available databases and potential RGs contained therein, together with the plethora of tools available, often requires manual selection of the most suitable RGs. To the best of the authors’ knowledge, there is currently no such tool providing both an integrated, highly specific workflow and scalable and rapid implementation. ReferenceSeeker was designed to overcome this bottleneck. As a novel command line tool, it combines a fast kmer profile-based lookup of candidate reference genomes (CRGs) from high quality databases with rapid computation of (mutual) highly specific ANI and conserved DNA values.




Autoren/Herausgeber




Zitierstile

Harvard-ZitierstilSchwengers, O., Hain, T., Chakraborty, T. and Goesmann, A. (2020) ReferenceSeeker: rapid determination of appropriate reference genomes, The Journal of Open Source Software, 5(46), Article 1994. https://doi.org/10.21105/joss.01994

APA-ZitierstilSchwengers, O., Hain, T., Chakraborty, T., & Goesmann, A. (2020). ReferenceSeeker: rapid determination of appropriate reference genomes. The Journal of Open Source Software. 5(46), Article 1994. https://doi.org/10.21105/joss.01994


Zuletzt aktualisiert 2025-21-05 um 17:09