There is a newer version of the record available.

Published October 28, 2024 | Version v1.0.0
Dataset Open

BBS phase 1 & phase 2 high quality E. coli bin assembled genomes

  • 1. University of Helsinki
  • 2. ROR icon University of Oslo
  • 3. ROR icon Wellcome Sanger Institute

Description

1,402 Escherichia coli bin assembled genomes derived from the metagenome data collected as part of the BabyBiome study (BBS) phase 1 & phase 2.

The data in this upload was first published as part of "Group 2 and 3 ABC-transporter dependant K-antigen loci contribute significantly to variation in the invasive potential of Escherichia coli"  (Gladstone et al. 2024, to be released).

Files

Assembly data:

  • BBS_E_coli_BAGs.tar: Archive containing sequences of the 1,402 bin assembled genomes.
  • BBS_E_coli_metadata.tsv: Table linking the sequence assemblies to the subject data.

Capsule predictions:

  • BBS_E_coli_Kaptive_output.csv: Capsule predictions for all sequence data.
  • BBS_E_coli_deduplicated_sequences_IDs.txt: Filenames for assemblies that constitute the 873 deduplicated sequences analysed in Gladstone et al. 2024.

Quality control data:

  • BBS_E_coli_demix_check_scores.tsv: Output from demix_check for the sequence assemblies.
  • BBS_E_coli_checkm_results.tsv: Output from checkm.
  • BBS_E_coli_gunc_results.tsv: Output from gunc.

Methods

Bin assembled genomes

Source data:

The data was produced using the mSWEEP and mGEMS pipeline (Mäklin et al. 2020 & Mäklin et al. 2021) following the steps described in Khawaja, Mäklin, Kallonen, et al. 2024.

Quality control

The BAGs in this upload were filtered with demix_check (https://github.com/harry-thorpe/demix_check) and only those with a quality score 1 or 2 are included. For the capsule type annotations, contigs shorter than 5,000bp were removed but the short contigs are still present in the uploaded files). Further QC data is available from checkm (Parks et al. 2015) and gunc (Orakov et al. 2022) results.

Multilocus sequence typing

Sequence type (ST) was determined using fastmlst (Guerrero-Araya et al. 2021) with the `ecoli#1` database.

PopPUNK  clustering

Sequence clusters (SC) correspond to the database available from https://zenodo.org/records/12528310 and were created using PopPUNK (Lees et al. 2019). Construction is described in Khawaja, Mäklin, Kallonen, et al. 2024.

Capsule type annotations

The capsule type annotations were created using Kaptive (Lam et al. 2022) with an E. coli specific database available from https://github.com/rgladstone/EC-K-typing and described in Gladstone et al. 2024.

Files

BBS_E_coli_deduplicated_sequences_IDs.txt

Files (2.0 GB)

Name Size Download all
md5:e9d027be34ee48364c59d8e1437da03d
2.0 GB Download
md5:6b6ab48eeb660d94ff98af64d05624e5
165.3 kB Download
md5:402f40a955b55b18b618ab1e7e9fa250
50.5 kB Preview Download
md5:74cc2f3e752cb986b42a63c0548a6d03
253.4 kB Download
md5:8af9535b9c6a2d04dd36ebcf477117d8
158.0 kB Download
md5:9c9f08cd4acefc074ceeae1753b83fbe
978.7 kB Preview Download
md5:8241e11d574eed90a92e0568797b3315
207.6 kB Download

Additional details

Related works

Continues
Journal article: 10.1038/s41467-022-35178-5 (DOI)
Is derived from
Journal article: 10.1038/s41586-019-1560-1 (DOI)
Journal article: 10.1038/s41564-024-01804-9 (DOI)

Dates

Available
2024-10-28