There is a newer version of the record available.

Published March 12, 2026 | Version v1.1.0

grp-bork/gunc: v1.1.0

  • 1. ROR icon European Molecular Biology Laboratory
  • 2. Queensland University of Technology
  • 3. EMBL
  • 4. ROR icon Harvard University
  • 5. ROR icon Boston Children's Hospital
  • 6. ROR icon University College Cork
  • 7. ROR icon Max Delbrück Center

Description

v1.0.7

Summary ^^^^^^^

This release adds support for two new reference databases (ProGenomes 3, GTDB r214) and a custom database option. A new gunc check subcommand validates your environment before submitting a long job, and gunc rescore is introduced as a clearer alias for gunc summarise. In addition a test_data database type has been added, which comprised of a minimal test set (sample, db, taxonomy) which can be used in CI/CD pipeline. A warning is now emitted when genomes have low reference representation scores. Packaging has been modernised to pyproject.toml and the CI pipeline updated.

Features ^^^^^^^^

  • Added support for progenomes_3 and gtdb_214 reference databases.
  • Added support for test_data set, a minimal set of data that can be used in CI/CD pipelines).
  • Added --custom_genome2taxonomy option to allow use of a custom reference database.
  • Diamond version pinned to 2.1.24; enforced at startup with a clear error message. Set GUNC_SKIP_DIAMOND_VERSION_CHECK=1 to bypass.
  • Added test_data option to gunc download_db (--db test_data): downloads a minimal diamond database and two test genomes (chimeric and clean) that can be used to verify a GUNC installation end-to-end.
  • Added gunc rescore as the preferred name for the summarise subcommand; gunc summarise remains as a backward-compatible alias.
  • Added gunc check subcommand to validate environment (tool dependencies, database file, custom genome-to-taxonomy TSV format, output directory write access) without running the pipeline.
  • All subcommands (run, plot, merge_checkm, summarise) now log the output file path on completion.
  • --file_suffix error message now suggests the correct flag usage when no files are found.
  • Fixed metavar="\\b" hack in summarise argparse definitions; replaced with meaningful placeholders (FILE, DIR, FLOAT).
  • Documentation: added gunc summarise section with worked example; fixed --file_suffix incorrectly listed as required; fixed --gunc_file help referencing gunc_scores.tsv (actual filename is GUNC.{db}.maxCSS_level.tsv); added --custom_genome2taxonomy file format spec; added output column definitions table; updated DB names to underscore convention throughout.

Bugfixes ^^^^^^^^

  • Fixed summarise subcommand incorrectly marking all genomes as passing GUNC.
  • Fixed pass.GUNC column being silently converted to strings in output TSV; summarise now uses proper NaN detection instead of string comparison.
  • Fixed summarise not rescoring genomes with boolean False in pass.GUNC; previously only the string "False" was matched, so boolean values (the normal case) were silently skipped.
  • Fixed genome identity corruption in split_diamond_output when contig names contain /; now uses rsplit to always extract the genome name from the last path segment.
  • Fixed DB detection logic duplicated across three code paths with subtly different ordering; extracted into single detect_db_from_filename() function.
  • Fixed prodigal() leaving partial output files on disk when gene calling fails; partial files are now removed so the caller's size check correctly excludes failed genomes.
  • Fixed extract_node_data() in visualisation missing colour entries for class and order tax levels, causing KeyError when non-default --tax_levels are used.
  • Extracted plot=True path from chim_score() into dedicated get_base_data_for_plotting() function; chim_score() now has a single consistent return type.
  • Fixed empty diamond output files not being named correctly when a genome fails to map ( thanks to @pamelaferretti ).
  • Fixed edge case where contamination score was incorrectly calculated when contamination portion was NaN.
  • Fixed crash when no genes were called or mapped to the reference database.
  • Fixed shell injection risk in get_record_count_in_fasta.

Other ^^^^^

  • Removed versioneer; version is now statically set.
  • Fixed 8 flake8 errors: import ordering in get_scores.py and visualisation.py, trailing whitespace in gunc.py, spurious f-string prefixes in gunc_database.py.
  • Extracted CSS_CHIMERIC_THRESHOLD = 0.45 and TAX_LEVELS as named constants in get_scores.py; replaced all three scattered hardcoded copies of the threshold and tax level list across gunc.py, checkm_merge.py, and visualisation.py.
  • Fixed all sys.exit(string) calls in visualisation.py and get_scores.py to use logger.error() + sys.exit(1) consistently with the rest of the codebase; added module-level logger to get_scores.py.
  • Fixed add_empty_diamond_output() using print() for progress output; now uses logger.info().
  • Fixed check_diamond_version() using shell=True; now uses list-form subprocess call.
  • Added guard against empty gunc_output list before pd.concat() in run_gunc() to give a clear error instead of a cryptic ValueError.
  • Reference data files renamed to reflect database version (e.g. genome2taxonomy_pg2.1ref.tsv).
  • Documentation updated: diamond version, all four database options, --custom_genome2taxonomy flag.
  • Migrated packaging from setup.py + setup.cfg + MANIFEST.in + requirements.txt to a single pyproject.toml (PEP 621); fixed package_data paths, license field (GPLv3), dropped universal=1, and added minimum version pins for numpy (>=1.20), scipy (>=1.7), and plotly (>=5.0).
  • Replaced all from module import * in test files with explicit named imports; marked network-dependent tests in test_gunc_database.py with @pytest.mark.integration; added conftest.py registering the integration marker.
  • Added tests for summarise(), get_scores_using_supplied_cont_cutoff(), read_genome2taxonomy_reference() (all 4 DBs + custom + unknown), split_diamond_output() round-trip, and detect_db_from_filename().

New Contributors

  • @pamelaferretti made their first contribution in https://github.com/grp-bork/gunc/pull/53

Full Changelog: https://github.com/grp-bork/gunc/compare/v1.0.6...v1.1.0

Files

grp-bork/gunc-v1.1.0.zip

Files (4.6 MB)

Name Size Download all
md5:ce0a688eb74f6afbe15673122fd628f3
4.6 MB Preview Download

Additional details

Related works

Is supplement to
Software: https://github.com/grp-bork/gunc/tree/v1.1.0 (URL)

Software