GeTPrev pre-built Default Database v1.0: Complete Genomes of Seven Enterobacteriaceae Genus
Authors/Creators
Description
This dataset contains the pre-built default genome database used by the GeTPrev (Gene Taxonomic Prevalence) pipeline. It includes complete genomes from seven Enterobacteriaceae genus curated from NCBI RefSeq, serving as the default reference panel for gene prevalence estimation. The dataset was constructed on September 26, 2025, and users are welcome to construct or update the database locally as needed. Instructions for database generation and updates are provided in the GeTPrev GitHub repository README. Metadata files containing genome accession and taxonomic information are available in the GeTPrev GitHub repository under the metadata/ directory.
The dataset was generated by the following command line:
bash build_EB_complete_genomes_db.sh
This command was executed on a high-performance computing (HPC) environment without a job scheduler. Users operating under a job scheduling system (e.g., SLURM) are welcome to specify the appropriate scheduler parameters when constructing or updating the database. Detailed instructions are available in the GeTPrev GitHub repository README.
Files
Files
(48.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f883d75afe85afe5905c3da20e3b17e1
|
48.3 GB | Download |
Additional details
Software
- Repository URL
- https://github.com/Weifanwu66/GeTPrev