Journal article

ReferenceSeeker: rapid determination of appropriate reference genomes


Authors listSchwengers, O; Hain, T; Chakraborty, T; Goesmann, A

Publication year2020

JournalThe Journal of Open Source Software

Volume number5

Issue number46

DOI Linkhttps://doi.org/10.21105/joss.01994

PublisherOpen Journals


Abstract

The enormous success and ubiquitous application of next and third generation sequencing has led to a large number of available high-quality draft and complete microbial genomes in the public databases. Today, the NCBI RefSeq database release 90 alone contains 11,060 complete bacterial genomes (Haft et al., 2018 ). Concurrently, selection of appropriate reference genomes (RGs) is increasingly important as it has enormous implications for routine in-silico analyses, as for example in detection of single nucleotide polymorphisms, scaffolding of draft assemblies, comparative genomics and metagenomic tasks. Therefore, a rigorously selected RG is a prerequisite for the accurate and successful application of the aforementioned bioinformatic analyses. In order to address this issue several new databases, methods and tools have been published in recent years e.g. RefSeq, DNA-DNA hybridization (Meier-Kolthoff, Auch, Klenk, & Göker, 2013), average nucleotide identity (ANI) as well as percentage of conserved DNA (conDNA) values (Goris et al., 2007) and Mash (Ondov et al., 2016). Nevertheless, the sheer amount of currently available databases and potential RGs contained therein, together with the plethora of tools available, often requires manual selection of the most suitable RGs. To the best of the authors’ knowledge, there is currently no such tool providing both an integrated, highly specific workflow and scalable and rapid implementation. ReferenceSeeker was designed to overcome this bottleneck. As a novel command line tool, it combines a fast kmer profile-based lookup of candidate reference genomes (CRGs) from high quality databases with rapid computation of (mutual) highly specific ANI and conserved DNA values.




Citation Styles

Harvard Citation styleSchwengers, O., Hain, T., Chakraborty, T. and Goesmann, A. (2020) ReferenceSeeker: rapid determination of appropriate reference genomes, The Journal of Open Source Software, 5(46), Article 1994. https://doi.org/10.21105/joss.01994

APA Citation styleSchwengers, O., Hain, T., Chakraborty, T., & Goesmann, A. (2020). ReferenceSeeker: rapid determination of appropriate reference genomes. The Journal of Open Source Software. 5(46), Article 1994. https://doi.org/10.21105/joss.01994


Last updated on 2025-21-05 at 17:09