Published October 25, 2023 | Version v2
Dataset Open

skDER Representative Genomes for Select Bacterial Taxa

  • 1. University of Wiscosnin - Madison
  • 2. University of Wiscosnin - Madison; McMaster University

Description

Genomes belonging to a single genus or order were gathered using a loose search of taxonomic classifications in GTDB R214. By loose we required the string 'g__{GENUSNAME}' to be found in taxonomic info column by GTDB, thus allowing gathering of associated genera (which GTDB suggests are different, but literature/domain experts have yet to rename).

Genomes belonging to a taxa were dereplicated using skDER (v1.0.7) in "greedy" clustering mode with default values for parameters (99% ANI cutoff, 90% AF cutoff).

Overview of Files:

- The 'Genome_Dereplication_Overview.tsv' contains details of all the genomes considered as potential representatives for each taxonomic group and their GTDB R214 taxonomic classifications.

- 18 _Clustering_Information.txt files which contains the relationship information of non-representative genomes to their nearest representative genome. Generated using the `-n` argument in skder v.1.0.7.  

- 18 tar.gz compressed directories are provided. Each compressed directory features representative genomes in FASTA format determined for a particular taxon using skDER with greedy clustering and default cutoffs. Genome assemblies are renamed to feature both the GTDB taxonomic classification and the GCA identifier.
        - Acinetobacter - 1,643 rep genomes (17.8% of 9,221 total genomes considered)
        - Bacillales - 3,150 rep genomes (35.9% of 8,766 total genomes considered)
        - Corynebacterium - 726 rep genomes (43.0% of 1,688 total genomes considered)
        - Cutibacterium - 27 rep genomes (5.4% of 502 total genomes considered)
        - Enterobacter - 878 rep genomes (19.9% of 4,408 total genomes considered)
        - Enterococcus - 937 rep genomes (14.6% of 6,426 total genomes considered)
        - Escherichia - 2,436 rep genomes (7.1% of 34,358 total genomes considered)
        - Klebsiella - 1,022 rep genomes (5.6% of 18,145 total genomes considered)
        - Lactobacillus - 541 rep genomes (30.9% of 1,747 total genomes considered)
        - Listeria - 353 rep genomes (6.9% of 5,062 total genomes considered)
        - Micromonospora - 211 rep genomes (73.3% of 288 total genomes considered)
        - Mycobacterium - 744 rep genomes (6.9% of 10,657 total genomes considered)
        - Neisseria - 414 rep genomes (12.8% of 3,235 total genomes considered)
        - Pseudomonas - 2,666 rep genomes (18.9% of 14,066 total genomes considered)
        - Salmonella - 308 rep genomes (2.2% of 14,109 total genomes considered)
        - Staphylococcus - 496 rep genomes (2.5% of 19,627 total genomes considered)
        - Streptococcus - 2,452 rep genomes (13.3% of 18,492 total genomes considered)
        - Streptomyces - 1,555 rep genomes (57.7% of 2,697 total genomes considered)

Files

Acinetobacter_Clustering_Information.txt

Files (28.9 GB)

Name Size Download all
md5:b21ebfa973ddd34379e0227edd117114
1.8 GB Download
md5:a0ca11ae1c3aa8730b55436faedc3f8a
2.9 MB Preview Download
md5:b9a957c732c91cbf107e1e7fe4152aa6
4.5 GB Download
md5:e347e338bd7b2aac840b627af8935476
2.7 MB Preview Download
md5:45655cd51af3d2b741ad93a173c38620
566.1 MB Download
md5:0fd89d7730d9a9c940722e49fd7daaad
550.9 kB Preview Download
md5:7002ea61d796621d668d86ae726d7579
20.9 MB Download
md5:696c83453ba8545bff203980f36f8d36
129.8 kB Preview Download
md5:979d9876ee2be5b8fb69cc1e54b72692
1.3 GB Download
md5:1de9689d7cdf4fe7f70d291678aea964
1.4 MB Preview Download
md5:b92127108af14f4d89f46dcea2f148d5
886.8 MB Download
md5:fec5285509b0bf2be0fbd87ec2e30ed2
2.0 MB Preview Download
md5:385ea1630bef502e3c14ee5c57af0e21
3.9 GB Download
md5:f591fd168e0981e99991f9bd5b92814f
10.2 MB Preview Download
md5:e7926e8ef5d37a9f1acda8f6509d7639
31.3 MB Download
md5:81d0aac441fddeca7e28fc2dee3e06a9
1.8 GB Download
md5:ce7bd2c71c112826095d7c6ccc195ce3
5.5 MB Preview Download
md5:3986f2256b2dfc2d359f3ecc54a68381
299.4 MB Download
md5:d63ccab538f6ed0aec89a7898caecda1
550.3 kB Preview Download
md5:af980ec2b34ffea4652c87ca7c8b8e6c
331.4 MB Download
md5:60e69e202ef9e675a8ab0c36ed0ec404
1.5 MB Preview Download
md5:dd17b550c0bd51b4f977e2016740c6ae
421.4 MB Download
md5:570f08aed844d77a99ad40ba9687e9f4
91.8 kB Preview Download
md5:9b137763229c9f0d81c1e80fd63ee52e
1.3 GB Download
md5:61ed53ed8c3090d046d15818571b08fd
3.4 MB Preview Download
md5:33b0f94fe882e53c45bedf3633e02fda
284.1 MB Download
md5:45b28a3ea9180dcaaebf1c1105453e8d
1.1 MB Preview Download
md5:4803bc0e9b6cdfae6f48a122285364b9
5.0 GB Download
md5:b23a996d7fd1a93b9f7a99e2e350d1bc
4.4 MB Preview Download
md5:1c6e3ed07c42e7f1c8cb6918b46a3bb8
473.8 MB Download
md5:3b2ffda23c9f1d238a293719e5bdad3a
4.2 MB Preview Download
md5:eec05397c789aea39ed368986a3cbd65
404.5 MB Download
md5:d11d9b000b46c1b292bb9ed044e93245
6.2 MB Preview Download
md5:87a619f3b8e900decc266e7f33abd0ef
1.6 GB Download
md5:fb2c6868001e22452ecf311a8a012539
5.8 MB Preview Download
md5:f2ea81da7884798fcebf9b387a112bea
3.9 GB Download
md5:a7d1c886b747d37c8f8bf37b649c3757
842.1 kB Preview Download