Published October 30, 2025 | Version v1
Publication Open

GeTPrev pre-built Default Database v1.0: Complete Genomes of Seven Enterobacteriaceae Genus

  • 1. USDA-ARS Roman L Hruska US Meat Animal Research Center
  • 2. ROR icon Agricultural Research Service

Description

This dataset contains the pre-built default genome database used by the GeTPrev (Gene Taxonomic Prevalence) pipeline. It includes complete genomes from seven Enterobacteriaceae genus curated from NCBI RefSeq, serving as the default reference panel for gene prevalence estimation. The dataset was constructed on September 26, 2025, and users are welcome to construct or update the database locally as needed. Instructions for database generation and updates are provided in the GeTPrev GitHub repository README. Metadata files containing genome accession and taxonomic information are available in the GeTPrev GitHub repository under the metadata/ directory.

The dataset was generated by the following command line:

bash build_EB_complete_genomes_db.sh

This command was executed on a high-performance computing (HPC) environment without a job scheduler. Users operating under a job scheduling system (e.g., SLURM) are welcome to specify the appropriate scheduler parameters when constructing or updating the database. Detailed instructions are available in the GeTPrev GitHub repository README.

Files

Files (48.3 GB)

Name Size Download all
md5:f883d75afe85afe5905c3da20e3b17e1
48.3 GB Download

Additional details