Published September 13, 2023 | Version v1
Dataset Open

Kraken2 Human Pangenome Reference Consortium database

  • 1. Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Australia

Description

A kraken2 database built from the genome assemblies used by the Human Pangenome Reference Consortium (https://projects.ensembl.org/hprc/). This archive contains the three files required by kraken2, hash.k2d, opts.k2d, and taxo.k2d, along with inspect.txt, which is obtained by running kraken2-inspect on the database, ktaxonomy.tsv, which contains the taxonomy information of the database (obtained by running https://github.com/jenniferlu717/KrakenTools#make_ktaxonomypy).

The genomes for this database were downloaded using the assembly summary text file included in this dataset and genome_updater.sh (v0.6.3; https://github.com/pirovc/genome_updater)

genome_updater.sh -m -a -f "genomic.fna.gz" -t 8 -e "hprc_assembly_summary.txt" -o HPRC_genomes/

The python script prepare_kraken_fasta.py was then used to prepare the assemblies for use in kraken with the following command

python prepare_kraken_fasta.py -r -T 9606 -o HPRC.fna HPRC_genomes/

The database was then built with kraken2 using the following commands

kraken2-build --download-taxonomy --db db/
kraken2-build --add-to-library HPRC.fna --db db/
kraken2-build --build --db db/ --threads 16

Files

hprc_assembly_summary.txt

Files (3.6 GB)

Name Size Download all
md5:38cea079d27781328731bc7c6974dd4b
36.9 kB Preview Download
md5:87275d884181cfb6b46fdb883195dacb
3.6 GB Download
md5:1f63449cd8043a879f73ab5a9f445d57
2.9 kB Download

Additional details

Related works

Is described by
Preprint: 10.1101/2023.09.18.558339 (DOI)