Mash Sketch of RefSeq Bacterial Representative Genomes v217
Description
This was created to get a new mash reference that was current. The script to create this uses ncbi datasets and mash (https://github.com/UPHL-BioNGS/Grandeur/blob/main/bin/new_mash_ref.sh)
This was created on April 27, 2023, and is RefSeq v217
```bash
#/bin/bash
out=mash_db
mkdir $out
cd $out
datasets summary genome taxon bacteria --reference --as-json-lines | \
dataformat tsv genome --fields accession,assminfo-refseq-category,organism-name --elide-header | \
grep representative | \
tee representative_genomes.txt | \
cut -f 1 > genome_ids.txt
echo "$(date): Downloading genomes for ids"
datasets download genome accession --inputfile genome_ids.txt --filename rep-genomes.zip
echo "$(date): Decompressing zip file"
unzip rep-genomes.zip
echo "$(date): Creating file for mash"
cat ncbi_dataset/data/*/*.fna | sed 's/ /_/g' | sed 's/,//g' > rep-genomes.fasta
echo "$(date): Skeching rep-genomes.fasta"
mash sketch -i -p 20 rep-genomes.fasta -o rep-genomes
############################################################
echo "$(date): File preparation is complete"
ls -alh rep-genomes.fasta
ls -alh rep-genomes.msh
```
Files
Files
(6.9 GB)
Name | Size | Download all |
---|---|---|
md5:0d184d0634182e19b1dfdf1f8e8350d1
|
6.9 GB | Download |