There is a newer version of the record available.

Published March 4, 2026 | Version 1.3.0
Dataset Open

Loci ID correspondence and loci subsets for the schemas deposited in Chewie-NS

Description

Dataset contents

This dataset includes files with the loci subsets, such as core loci subsets for cgMLST analysis, and the correspondence between the legacy loci IDs used by the discontinued Chewie-NS instance and the loci IDs used by the latest Chewie-NS instance. Each ZIP archive includes files for the schemas of a specific species. The contents of each ZIP archive are the following:

  • species1_Spyogenes.zip -- contains the files for the schemas of the species with ID=1 (Streptococcus pyogenes).
    • species1_Spyogenes_schema1 -- contains the files for the schema with ID=1.
      • species1_Spyogenes_schema1_loci_IDs_mapping.tsv -- contains the loci ID correspondence between the loci IDs used by the current instance of Chewie-NS and the original loci IDs.
      • species1_Spyogenes_schema1_cgMLST95_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for the core loci defined based on a loci presence threshold of 95%.
      • species1_Spyogenes_schema1_cgMLST95_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the core loci, defined based on a loci presence threshold of 95%, used by the current instance of Chewie-NS and the original loci IDs.
      • species1_Spyogenes_schema1_cgMLST99_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for the core loci defined based on a loci presence threshold of 99%.
      • species1_Spyogenes_schema1_cgMLST99_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the core loci, defined based on a loci presence threshold of 99%, used by the current instance of Chewie-NS and the original loci IDs.
      • species1_Spyogenes_schema1_cgMLST100_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for the core loci defined based on a loci presence threshold of 100%.
      • species1_Spyogenes_schema1_cgMLST100_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the core loci, defined based on a loci presence threshold of 100%, used by the current instance of Chewie-NS and the original loci IDs.
      • species1_Spyogenes_schema1_Transcriptional_Regulators_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for a set of transcriptional regulators.
      • species1_Spyogenes_schema1_Transcriptional_Regulators_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the transcriptional regulators used by the current instance of Chewie-NS and the original loci IDs.
      • species1_Spyogenes_schema1_Virulence_Factors_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for a set of virulence factors.
      • species1_Spyogenes_schema1_Virulence_Factors_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the virulence factors used by the current instance of Chewie-NS and the original loci IDs.
  • species10_Ecoli.zip -- contains the files for the schemas of the species with ID=10 (Escherichia coli).
    • species10_Ecoli_schema1 -- contains the files for the schema with ID=1 (more information about the schema creation process and the definition of the loci subsets is available here).
      • species10_Ecoli_schema1_loci_IDs_mapping.tsv -- contains the loci ID correspondence between the loci IDs used by the current instance of Chewie-NS, the original loci IDs, and the loci IDs used in the first instance of Chewie-NS (discontinued on July 2025).
      • species10_Ecoli_schema1_cgMLST99_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for the core loci defined based on a loci presence threshold of 99%.
      • species10_Ecoli_schema1_cgMLST99_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the core loci, defined based on a loci presence threshold of 99% as described here, used by the current instance of Chewie-NS, the original loci IDs, and the loci IDs used in the first instance of Chewie-NS (discontinued on July 2025).
  • species14_Senterica.zip -- contains the files for the schemas of the species with ID=14 (Salmonella enterica).
    • species14_Senterica_schema1 -- contains the files for the schema with ID=1 (more information about the schema creation process and the definition of the loci subsets is available here).
      • species14_Senterica_schema1_loci_IDs_mapping.tsv -- contains the loci ID correspondence between the loci IDs used by the current instance of Chewie-NS, the original loci IDs, and the loci IDs used in the first instance of Chewie-NS (discontinued on July 2025).
      • species14_Senterica_schema1_cgMLST99_loci_IDs.txt -- contains the list of loci IDs used by the current instance of Chewie-NS for the core loci defined based on a loci presence threshold of 99%.
      • species14_Senterica_schema1_cgMLST99_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the core loci, defined based on a loci presence threshold of 99% as described here, used by the current instance of Chewie-NS, the original loci IDs, and the loci IDs used in the first instance of Chewie-NS (discontinued on July 2025).
  • species18_Lmonocytogenes.zip -- contains the files for the schemas of the species with ID=18 (Listeria monocytogenes).
    • species18_Lmonocytogenes_schema1 -- contains the files for the schema with ID=1 (corresponding to the Institut Pasteur Listeria moncytogenes cgMLST schema described in Moura et al, 2016, available at https://bigsdb.pasteur.fr/listeria/).
      • species18_Lmonocytogenes_schema1_loci_IDs_mapping.tsv -- contains the correspondence between the loci IDs for the core loci used by the current instance of Chewie-NS, the original loci IDs, and the loci IDs used in the first instance of Chewie-NS (discontinued on July 2025).

Converting legacy loci IDs to the latest loci IDs

It is possible to convert the legacy loci IDs in results files generated with schemas downloaded from the discontinued Chewie-NS instance to the loci IDs used by the latest Chewie-NS instance using the convert_ids.py Python script included in this dataset. This script converts any legacy loci IDs in results files to the loci IDs used by the latest instance of Chewie-NS. The script accepts a single results file (e.g., files generated by chewBBACA's AlleleCall module, such as the results_alleles.tsv or loci_summary_stats.tsv files) and a TSV file with the loci ID correspondence.  Each of the files below has the loci ID correspondence between legacy and latest schemas for the following species:

  • Escherichia coli (species10_Ecoli_schema1_loci_IDs_mapping.tsv)
  • Salmonella enterica (species14_Senterica_schema1_loci_IDs_mapping.tsv)
  • Listeria monocytogenes (species18_Lmonocytogenes_schema1_loci_IDs_mapping.tsv)

As an example, to convert legacy loci IDs in a results file for E. coli, such as the results_alleles.tsv file containing allelic profiles, with the following contents:

FILE INNUENDO_wgMLST-00016024 INNUENDO_wgMLST-00016025 INNUENDO_wgMLST-00016026
Genome1 1 2 1
Genome2 2 2 2
Genome3 1 1 1

All that is necessary is to run the following command:

python convert_ids.py -i results_alleles.tsv -it species10_Ecoli_schema1_loci_IDs_mapping.tsv

The script will substitute all legacy loci IDs by the loci IDs used by the latest instance of Chewie-NS, resulting in the following file contents:

FILE wgMLST-00027274 wgMLST-00027275  wgMLST-00027276
Genome1 1 2 1
Genome2 2 2 2
Genome3 1 1 1

The script can be used to convert loci IDs in any file that includes legacy loci IDs. It is also possible to convert back to the legacy loci IDs by providing the --invert option. To view the full usage instructions for the script, run the following command:

python convert_ids.py -h

Files

species10_Ecoli.zip

Files (261.0 kB)

Name Size Download all
md5:73ce0f140d8002dd9248fda66d4d63a9
2.5 kB Download
md5:541b82d853b9e82c7b3a4db618fe97cd
89.9 kB Preview Download
md5:6009b1678acb6eba41a1b0b4b451c3fb
105.2 kB Preview Download
md5:dbefa35d7d5bf653e72110ba3d95aecf
14.5 kB Preview Download
md5:6ceb942af30ea9da74bc32ab7623f7ea
48.9 kB Preview Download