Kraken2 Human Pangenome Reference Consortium database
Creators
- 1. Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Australia
Description
A kraken2 database built from the genome assemblies used by the Human Pangenome Reference Consortium (https://projects.ensembl.org/hprc/). This archive contains the three files required by kraken2, hash.k2d, opts.k2d, and taxo.k2d, along with inspect.txt, which is obtained by running kraken2-inspect on the database, ktaxonomy.tsv, which contains the taxonomy information of the database (obtained by running https://github.com/jenniferlu717/KrakenTools#make_ktaxonomypy).
The genomes for this database were downloaded using the assembly summary text file included in this dataset and genome_updater.sh (v0.6.3; https://github.com/pirovc/genome_updater)
genome_updater.sh -m -a -f "genomic.fna.gz" -t 8 -e "hprc_assembly_summary.txt" -o HPRC_genomes/
The python script prepare_kraken_fasta.py was then used to prepare the assemblies for use in kraken with the following command
python prepare_kraken_fasta.py -r -T 9606 -o HPRC.fna HPRC_genomes/
The database was then built with kraken2 using the following commands
kraken2-build --download-taxonomy --db db/
kraken2-build --add-to-library HPRC.fna --db db/
kraken2-build --build --db db/ --threads 16
Files
hprc_assembly_summary.txt
Files
(3.6 GB)
Name | Size | Download all |
---|---|---|
md5:38cea079d27781328731bc7c6974dd4b
|
36.9 kB | Preview Download |
md5:87275d884181cfb6b46fdb883195dacb
|
3.6 GB | Download |
md5:1f63449cd8043a879f73ab5a9f445d57
|
2.9 kB | Download |
Additional details
Related works
- Is described by
- Preprint: 10.1101/2023.09.18.558339 (DOI)