Published May 21, 2026 | Version v1.14.0
Software Open

experimaestro/datamaestro: v1.14.0 — pluggable hf_resolver helpers + env-driven local mirror

Authors/Creators

  • 1. CNRS

Description

Highlights

  • New datamaestro.helpers module: pluggable helpers contributed by third-party packages through the datamaestro.helpers entry-point group.
  • HFResolver Protocol — redirect HF Hub repos to a local directory (e.g. an HPC cluster's shared model/dataset mirror).
  • Built-in _EnvHFResolver driven by colon-separated env vars:
    • DATAMAESTRO_HF_MODELS_CACHE (model repos: <root>/<repo_id>/ with config.json)
    • DATAMAESTRO_HF_DATASETS_CACHE (dataset repos: <root>/<repo_id>/) Always registered — no plugin required for the env-driven flow.
  • HFDownloader and HFSnapshotDownloader consult get_helpers("hf_resolver") before reaching the network. No symlinks, no copies — the mirror directory is exposed as the resource path directly.

Use case

On HPC clusters with a pre-mirrored HF cache (e.g. Jean-Zay's $DSDIR/HuggingFace_Models / $DSDIR/HuggingFace):

export DATAMAESTRO_HF_MODELS_CACHE=$DSDIR/HuggingFace_Models
export DATAMAESTRO_HF_DATASETS_CACHE=$DSDIR/HuggingFace

— HF lookups are served from the shared mirror with zero user-quota cost. Distinctive INFO log lines fire on every hit.

Tests

Adds 13 tests covering the env resolver (single & colon-separated roots, missing-config-json case, dataset lookup, Protocol compliance), plugin layering via entry points, and short-circuit behaviour of both downloaders.

Files

experimaestro/datamaestro-v1.14.0.zip

Files (333.8 kB)

Name Size Download all
md5:bafdcc58267985b6980f86adf5e79fc2
333.8 kB Preview Download

Additional details

Related works