Published May 21, 2026
| Version v1.14.0
Software
Open
experimaestro/datamaestro: v1.14.0 — pluggable hf_resolver helpers + env-driven local mirror
Description
Highlights
- New
datamaestro.helpersmodule: pluggable helpers contributed by third-party packages through thedatamaestro.helpersentry-point group. HFResolverProtocol — redirect HF Hub repos to a local directory (e.g. an HPC cluster's shared model/dataset mirror).- Built-in
_EnvHFResolverdriven by colon-separated env vars:DATAMAESTRO_HF_MODELS_CACHE(model repos:<root>/<repo_id>/withconfig.json)DATAMAESTRO_HF_DATASETS_CACHE(dataset repos:<root>/<repo_id>/) Always registered — no plugin required for the env-driven flow.
HFDownloaderandHFSnapshotDownloaderconsultget_helpers("hf_resolver")before reaching the network. No symlinks, no copies — the mirror directory is exposed as the resource path directly.
Use case
On HPC clusters with a pre-mirrored HF cache (e.g. Jean-Zay's $DSDIR/HuggingFace_Models / $DSDIR/HuggingFace):
export DATAMAESTRO_HF_MODELS_CACHE=$DSDIR/HuggingFace_Models
export DATAMAESTRO_HF_DATASETS_CACHE=$DSDIR/HuggingFace
— HF lookups are served from the shared mirror with zero user-quota cost. Distinctive INFO log lines fire on every hit.
Tests
Adds 13 tests covering the env resolver (single & colon-separated roots, missing-config-json case, dataset lookup, Protocol compliance), plugin layering via entry points, and short-circuit behaviour of both downloaders.
Files
experimaestro/datamaestro-v1.14.0.zip
Files
(333.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:bafdcc58267985b6980f86adf5e79fc2
|
333.8 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/experimaestro/datamaestro/tree/v1.14.0 (URL)
Software
- Repository URL
- https://github.com/experimaestro/datamaestro