kaiju_mycobacterium_pre-compiled
Authors/Creators
Description
Kaiju database – Mycobacterium pre-compiled subset (2024 release)
This dataset provides a pre-compiled Kaiju database containing protein sequences exclusively from the genus Mycobacterium, extracted from the NCBI NR/RefSeq repositories (August 2024).
The database was built to optimize the taxonomic classification of sequencing reads from Mycobacterium tuberculosis and related species, significantly reducing computational requirements compared to the full Kaiju NR database (~100 GB).
Unlike the standard Kaiju NR database or raw FASTA-based subsets, this release distributes the final Kaiju index files already built, allowing immediate use in analysis pipelines without requiring database construction.
This subset includes representative genomes from Mycobacterium tuberculosis, M. bovis, M. africanum, M. smegmatis, and other clinically or environmentally relevant species within the genus.
Contents:
-
kaiju_db_mycobacterium_2024.fmi— Kaiju formatted database index -
nodes.dmp,names.dmp— NCBI taxonomy mapping files
Total size: ~1 GB
Kaiju version: compatible with ≥ 1.9.0
Reference source: NCBI NR/RefSeq (retrieved August 2024)
Use case:
Designed for pipelines performing taxonomic classification and contamination screening of Mycobacterium sequencing data, enabling faster execution while maintaining taxonomic resolution at the species level.
Recommended citation:
Kaiju database – Mycobacterium subset (2024 release). Zenodo. https://10.5281/zenodo.17554952
Menzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications, 7, 11257. https://doi.org/10.1038/ncomms11257
Files
Files
(300.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6f087a7deed64f176c3fd02eed3ecfb9
|
300.9 MB | Download |
Additional details
Identifiers
- Other
- kaiju-mycobacterium-precompiled-2024
Related works
- Is supplement to
- Publication: 10.1038/ncomms11257 (DOI)
Dates
- Available
-
2025-12-26