Published May 2, 2023 | Version 217
Dataset Open

Mash Sketch of RefSeq Bacterial Representative Genomes v217

  • 1. UPHL

Description

This was created to get a new mash reference that was current. The script to create this uses ncbi datasets and mash (https://github.com/UPHL-BioNGS/Grandeur/blob/main/bin/new_mash_ref.sh)

This was created on April 27, 2023, and is RefSeq v217

```bash

#/bin/bash
out=mash_db

mkdir $out

cd $out

datasets summary genome taxon bacteria --reference --as-json-lines | \

dataformat tsv genome --fields accession,assminfo-refseq-category,organism-name --elide-header | \

grep representative | \

tee representative_genomes.txt | \

cut -f 1 > genome_ids.txt

echo "$(date): Downloading genomes for ids"

datasets download genome accession --inputfile genome_ids.txt --filename rep-genomes.zip

echo "$(date): Decompressing zip file"

unzip rep-genomes.zip

echo "$(date): Creating file for mash"

cat ncbi_dataset/data/*/*.fna | sed 's/ /_/g' | sed 's/,//g' > rep-genomes.fasta

echo "$(date): Skeching rep-genomes.fasta"

mash sketch -i -p 20 rep-genomes.fasta -o rep-genomes

############################################################

echo "$(date): File preparation is complete"

ls -alh rep-genomes.fasta

ls -alh rep-genomes.msh

```

Files

Files (6.9 GB)

Name Size Download all
md5:0d184d0634182e19b1dfdf1f8e8350d1
6.9 GB Download