Published April 27, 2022 | Version 2.0.0
Journal article Open

Naming the unnamed: Over 65,000 Candidatus names for unnamed Archaea and Bacteria in the Genome Taxonomy Database

  • 1. University of East Anglia
  • 2. Quadram Insitute Bioscience
  • 3. University of Innsbruck

Description

Thousands of new bacterial and archaeal species and higher-level taxa are discovered each year through the analysis of genomes and metagenomes. The Genome Taxonomy Database (GTDB) provides hierarchical sequence-based descriptions and classifications for new and as-yet-unnamed taxa. However, bacterial nomenclature, as currently configured, cannot keep up with the need for new well-formed names. Instead, microbiologists have been forced to use hard-to-remember alphanumeric placeholder labels. Here, we exploit an approach to the generation of well-formed arbitrary Latinate names at a scale sufficient to name tens of thousands of unnamed taxa within GTDB. These newly created names represent an important resource for the microbiology community, facilitating communication between bioinformaticians, microbiologists and taxonomists, while populating the emerging landscape of microbial taxonomic and functional discovery with accessible and memorable linguistic labels.

Presented here are input and output files associated with the scripts used in this project. Note that the file simple.txt was too large to upload but can be downloaded from https://hosted-datasets.gbif.org/datasets/backbone/backbone-current-simple.txt.gz

Scripts are available from 

This version of the files and associated names for bacteria supercedes those published here: https://zenodo.org/record/5652886 

which were associated with this preprint: https://www.preprints.org/manuscript/202111.0557/v1

Files

ar53_r207_corrected_ar_genus_names_table.txt

Files (1.6 GB)

Name Size Download all
md5:54f039701da66979f0c4bbc4a2aa2e3d
186.2 kB Download
md5:2522604eebfb4916e1fff2eaf876ceeb
2.7 MB Download
md5:cd99ab8c55fffac51ab05c9510b07f67
7.5 MB Download
md5:4ad832c518cab008f564c6390321664e
7.6 MB Download
md5:53d543e9ca5c7702a7aa33600e904320
23.7 kB Preview Download
md5:0ddb826e7a50aeb910465e65f5b52bc9
59.3 kB Preview Download
md5:1aae30db6a49f2c6200ac4c3130aeff6
861.7 kB Download
md5:5efbd975c5b414dfd495d3180c2de77e
865.1 kB Download
md5:e92650544e3636f091f87f364be36823
901.4 kB Download
md5:23d5dd1a8c3549d054d847d52a2f73ca
883.6 kB Download
md5:dc67cdcc4636e5fdcba023b678a2d9f2
915.0 kB Download
md5:7337868a413092cd36fe8a7bc6733a77
16.8 kB Download
md5:c2a587c9b52a368cfbda844c49d823c8
39 Bytes Preview Download
md5:2f18a7998f4737c467872fc208e4c128
13.5 kB Preview Download
md5:a318c0d3e6b92fc978acc87e032b0868
2.1 MB Preview Download
md5:d54fd953953dc4018b09dc5d67c8ae41
3.3 MB Download
md5:f46495420129e04010288321110b15eb
422.5 MB Download
md5:e5adb0f4ce37b8f52e7a873a0032ced5
422.9 MB Download
md5:1fa7fe585e71b0296dddcbf04cc7151b
225.2 kB Preview Download
md5:f10fbda3fdea3852b1d26d8adc772ff1
994.3 kB Preview Download
md5:29853293a2be8f0fe16ad1c206e7de3e
45.2 MB Download
md5:cbb2d614511691b994125e64470c73cf
45.4 MB Download
md5:c9fa60ad17a662b53c1d720b376f814a
45.6 MB Download
md5:960a96365083bd895baebeb186ff1998
45.3 MB Download
md5:712c0ab4f3b32c887082791cbc5b26fa
45.8 MB Download
md5:5954bba32ebfe519638c5056f9541421
380.4 kB Download
md5:eb99cb339087b4b965d6329c9c1d094b
329 Bytes Preview Download
md5:0f77fbfdef4b154d83372a1841f8aec8
299.2 kB Preview Download
md5:231aa43ede0f66ff53827d3bb88a9e74
27.5 MB Preview Download
md5:b088ae134ef21258afaae6adf3aed9dd
46.3 MB Download
md5:7905332d1844fecd61579c72a29a4ac3
259.3 MB Preview Download
md5:f34072d152df8b53acc681cb829851b3
12.8 kB Preview Download
md5:63e8cf1b862ae04d5b5d2125fa2b2b8d
99.8 kB Preview Download
md5:f4e1bd167af427882fe7380f4f6929df
50.0 MB Preview Download
md5:443192562714719285e50a20f2efa135
7.9 kB Preview Download
md5:2006f92025c335c0fe67bf7ccf6cc514
2.5 kB Preview Download
md5:37fec70840678e28fa6641107442df7f
10.4 kB Preview Download
md5:d9d2c5d54a083262a03348e005f8eccd
46.5 MB Download
md5:faa04bd376eb09d9940487a514f66cdf
46.2 MB Download
md5:6079fd25143ffb37346c226cb648ce0c
780 Bytes Download
md5:3e20c43f8cfb70bc43d6e1f9bffb8511
143.7 kB Preview Download
md5:25e523ba42b7663a9f9949a28f6270d8
132.3 kB Download
md5:315d623ebbec105f079e205d25c548bb
2.7 MB Preview Download
md5:ad75d98bb0783ff7102d6b129734c031
3.5 MB Download
md5:9d5c67d65e32788e7afc1bf5fac7b4ed
298 Bytes Preview Download
md5:454e27dfa275918feea1e222ffd74244
2.7 MB Preview Download
md5:a0d05ac8ebe16e250f01b8545a45632a
12.5 kB Preview Download
md5:a4e3145901544e80ef8214fdb4fd1df1
183.0 kB Preview Download
md5:2295c6c2398bf3ce4b1f87e2d2b50189
2.3 MB Preview Download
md5:6c9811826b2f42fbca7368336259e03f
150.7 kB Preview Download
md5:2c581200ad7b3c5e1011265a8299d5e4
432.0 kB Preview Download