Published June 19, 2018
| Version 1.0.2
Dataset
Open
sistr_cmd v1.0.2 serotyping databases
Description
Salmonella In Silico Typing Resource (SISTR) sistr_cmd version 1.0.2 serotyping databases
File structure tree for sistr_cmd
data
folder:
.
|-- [4.0K] antigens
| |-- [1.0M] fliC.fasta
| |-- [210K] fljB.fasta
| |-- [126K] wzx.fasta
| `-- [ 60K] wzy.fasta
|-- [4.0K] cgmlst
| |-- [7.4M] cgmlst-centroid.fasta
| |-- [ 96M] cgmlst-full.fasta
| |-- [134M] cgmlst-profiles.hdf
| `-- [ 803] README.md
|-- [1.1M] genomes-to-serovar.txt
|-- [1.0M] genomes-to-subspecies.txt
|-- [118K] Salmonella-serotype_serogroup_antigen_table-WHO_2007.csv
`-- [ 92M] sistr.msh
2 directories, 12 files
Description of files:
genomes-to-serovar.txt
: Each genome id to serovar designation delimited by tab character for the 52,790 Salmonella genomes.genomes-to-subspecies.txt
: Each genome id to subspecies designation delimited by tab character for the 52,790 Salmonella genomes.Salmonella-serotype_serogroup_antigen_table-WHO_2007.csv
: Serovar and antigenic formula information table used by `sistr_cmd` for looking up serovar designations from antigen resultssistr.msh
: Mash sketch file of 11840 Salmonella genomes for Mash-based serotypingantigens
: for antigen gene search-based serotypingfliC.fasta
: fliC gene alleles for H1-antigen typingfljB.fasta
: fljB gene alleles for H2-antigen typingwzx.fasta
: wzx gene alleles for O-antigen typingwzy.fasta
: wzy gene alleles for O-antigen typing
cgmlst
for core-genome multilocus sequence typing (cgMLST) and cgMLST-based serotypingcgmlst-profiles.hdf
: HDF5 file with cgMLST allelic profiles of 52,790 Salmonella genomes- read in with Pandas, i.e.
pd.read_hdf(CGMLST_PROFILES_PATH, key='cgmlst')
- read in with Pandas, i.e.
cgmlst-centroid.fasta
: "Centroid" or representative alleles of 52,790 Salmonella genomes for rapid NCBI BLAST+ blastn searching. Centroid alleles were defined from the full set of alleles for the 52,790 Salmonella genomes as the alleles for each locus:- group alleles by length
- group length grouped alleles by ends (28bp at allele start and end; 28 is word size of blastn megablast)
- hierarchical clustering of length+end grouped alleles
- flat clusters at 2.5% distance
- within each cluster, pick allele with least distance to others in cluster
cgmlst-full.fasta
: alleles for the 52,790 Salmonella genomes
Files
genomes-to-serovar.txt
Files
(348.7 MB)
Name | Size | Download all |
---|---|---|
md5:3459c4cb1d459d4670cef246b497914f
|
7.7 MB | Download |
md5:86f28499099b3ec10525ffe5ae287012
|
100.7 MB | Download |
md5:073ae146e9f729cbea59d27e6639024a
|
140.2 MB | Download |
md5:69854c38bf25873afc0bf48e26b1eda4
|
1.1 MB | Download |
md5:aea4912a7bfd01c1117cecc16a5170ed
|
214.5 kB | Download |
md5:1f262c4c2c7ed9cfdc8bda9b010f3279
|
1.2 MB | Preview Download |
md5:1e03cded94ea74c6910fde53914edd73
|
1.1 MB | Preview Download |
md5:4243da7ec8ab7bb2ab43860433c43603
|
120.4 kB | Preview Download |
md5:eaab468877783b83346efa11202d84fe
|
96.2 MB | Download |
md5:89172f9516eadc7cbb8538a2fd1be6f9
|
129.4 kB | Download |
md5:001792c2cb15dd0ea40114539309854e
|
61.0 kB | Download |