Published November 17, 2025 | Version v1
Dataset Open

Dataset from the MIST framework: Mutual Information estimation via Supervised Training

  • 1. ROR icon Université Grenoble Alpes
  • 2. ROR icon New York University

Description

This dataset consists of meta-datasets built for supervised training and evaluation of meta-learned mutual information (MI) estimators. Using the BMI library, we generate complex distributions with known MI by applying invertible transformations to simple base distributions. The base distribution families are split into disjoint training and testing sets, ensuring different supports between the training and test meta-distributions. Each meta-datapoint contains paired samples drawn from a joint distribution and their corresponding MI value. The datasets cover dimensions from 2 to 32 and sample sizes between 10 and 500.

The training meta-dataset M_train includes 16 base distribution families and 625k meta-datapoints.

Two test sets are provided: a small one (M_test, 2.3k points) for slow baselines, and an extended one (M_test_extended, 806k points) that includes both seen and unseen distribution families.

Files

dataset_intro.txt

Files (39.3 GB)

Name Size Download all
md5:34b12b31393565cd352935d9d218f6e5
716 Bytes Preview Download
md5:6eb8a08baba5a575e52c8be99137acbc
379 Bytes Download
md5:cf4b7d59617241774ca69fb1607a4047
12.9 GB Download
md5:e323b2179a6262affcc8f45369115565
74.1 MB Preview Download
md5:dc27ba1b3609d4c2b950139d5d2dbc71
424 Bytes Download
md5:cc5c5bf7d99762fcc5839eb43d4f62fe
15.1 GB Download
md5:4e170ad77fd3870e59198361181b887e
87.4 MB Preview Download
md5:a00b92b7889e4fcbdd42caf3dd94c8c5
405 Bytes Download
md5:fd86b9185a70b21e3e0e85de87201915
38.4 MB Download
md5:1b58dbbb81cdbb3793cc23330483b934
212.6 kB Preview Download
md5:18d6429d5dfa1e3792942197af231867
450 Bytes Download
md5:1bab99ea45155504d7ae7a31297b11a7
45.0 MB Download
md5:4d525f206f19bc8ac3729f6fcb42cab0
250.8 kB Preview Download
md5:3332d3849ec55006ab616c3387003091
842 Bytes Download
md5:f66a9fedfeee1dddfdd774ee1418e6e8
11.0 GB Download
md5:5ba6db6be97ee4313aa4d2bbcf447dcc
75.0 MB Preview Download

Additional details

Related works

Is published in
Dataset: arXiv:2502.12088 (arXiv)

Funding

U.S. National Science Foundation
Graduate Research Fellowship DGE-2039655
Ministry of Science and ICT
Global AI Frontier Lab International Collaborative Research
Samsung Advanced Institute of Technology (South Korea)
Next Generation Deep Learning: From Pattern Recognition to AI 1922658
Centre National de la Recherche Scientifique
Grant under ANR CPJ2 program ANR-22-CPJ2-0036-01
Grand Équipement National de Calcul Intensif (France)
Access to HPC resources of IDRIS 2025-AD011014834

Software

Repository URL
https://github.com/grgera/mist
Programming language
Python
Development Status
Active