Published February 24, 2025 | Version 6.0
Dataset Open

Bakta database

  • 1. Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen, 35392, Germany; Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen, 35392, Germany; German Centre for Infection Research (DZIF), partner site Giessen-Marburg-Langen, Giessen, Germany

Description

This data repository contains the mandatory DB for Bakta.

It is available in two versions: the default (db.tar.gz or) and a lightweight alternative (db-light.tar.gz).

Bakta is a tool for the rapid & standardized local annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readble JSON & bioinformatics standard file formats for automatic downstream analysis: https://github.com/oschwengers/bakta

This db provides protein sequence hash digests and lengths of UniProt's UniRef100 clusters, UniParc and NCBI RefSeq sequences for ultra-fast identification & lookups. It has been pre-annotated with several specialized db and enriched with Dbxrefs. Furthermore, seed sequences of UniProt's UniRef90 clusters are stored for fallback homology searches via Diamond sequence alignments. All conducted pre-annotations are logged and provided in the db.log.gz file.

External DB versions:

  • NCBI AMRFinderPlus: 2024-12-18.1
  • COG: 2024
  • DoriC: 12
  • ISFinder: 2019-09-25
  • Mob-suite: 3
  • Pfam: 37.2
  • RefSeq: r228
  • Rfam: 15
  • UniProtKB/Swiss-Prot: 2025_01
  • VFDB: 2025-02-14
     

Files

db-versions.json

Files (55.6 GB)

Name Size Download all
md5:4a6e059ded39e9c5537ef4137d2f5648
1.3 GB Download
md5:d4f0c8bf796547a0ad783d962e97d799
2.8 kB Preview Download
md5:78c6be98dcf5c7571df881812632b1fd
22.4 GB Download
md5:4c1115e40abfa2b464ae5dd988bdd88e
31.9 GB Download

Additional details

Related works

Is cited by
Journal article: 10.1099/mgen.0.000685 (DOI)
Is required by
Software: https://github.com/oschwengers/bakta (URL)

Software

Repository URL
https://github.com/oschwengers/bakta
Programming language
Python
Development Status
Active