Dataset from the MIST framework: Mutual Information estimation via Supervised Training
Authors/Creators
Description
This dataset consists of meta-datasets built for supervised training and evaluation of meta-learned mutual information (MI) estimators. Using the BMI library, we generate complex distributions with known MI by applying invertible transformations to simple base distributions. The base distribution families are split into disjoint training and testing sets, ensuring different supports between the training and test meta-distributions. Each meta-datapoint contains paired samples drawn from a joint distribution and their corresponding MI value. The datasets cover dimensions from 2 to 32 and sample sizes between 10 and 500.
The training meta-dataset M_train includes 16 base distribution families and 625k meta-datapoints.
Two test sets are provided: a small one (M_test, 2.3k points) for slow baselines, and an extended one (M_test_extended, 806k points) that includes both seen and unseen distribution families.
Files
dataset_intro.txt
Files
(39.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:34b12b31393565cd352935d9d218f6e5
|
716 Bytes | Preview Download |
|
md5:6eb8a08baba5a575e52c8be99137acbc
|
379 Bytes | Download |
|
md5:cf4b7d59617241774ca69fb1607a4047
|
12.9 GB | Download |
|
md5:e323b2179a6262affcc8f45369115565
|
74.1 MB | Preview Download |
|
md5:dc27ba1b3609d4c2b950139d5d2dbc71
|
424 Bytes | Download |
|
md5:cc5c5bf7d99762fcc5839eb43d4f62fe
|
15.1 GB | Download |
|
md5:4e170ad77fd3870e59198361181b887e
|
87.4 MB | Preview Download |
|
md5:a00b92b7889e4fcbdd42caf3dd94c8c5
|
405 Bytes | Download |
|
md5:fd86b9185a70b21e3e0e85de87201915
|
38.4 MB | Download |
|
md5:1b58dbbb81cdbb3793cc23330483b934
|
212.6 kB | Preview Download |
|
md5:18d6429d5dfa1e3792942197af231867
|
450 Bytes | Download |
|
md5:1bab99ea45155504d7ae7a31297b11a7
|
45.0 MB | Download |
|
md5:4d525f206f19bc8ac3729f6fcb42cab0
|
250.8 kB | Preview Download |
|
md5:3332d3849ec55006ab616c3387003091
|
842 Bytes | Download |
|
md5:f66a9fedfeee1dddfdd774ee1418e6e8
|
11.0 GB | Download |
|
md5:5ba6db6be97ee4313aa4d2bbcf447dcc
|
75.0 MB | Preview Download |
Additional details
Related works
- Is published in
- Dataset: arXiv:2502.12088 (arXiv)
Funding
- U.S. National Science Foundation
- Graduate Research Fellowship DGE-2039655
- Ministry of Science and ICT
- Global AI Frontier Lab International Collaborative Research
- Samsung Advanced Institute of Technology (South Korea)
- Next Generation Deep Learning: From Pattern Recognition to AI 1922658
- Centre National de la Recherche Scientifique
- Grant under ANR CPJ2 program ANR-22-CPJ2-0036-01
- Grand Équipement National de Calcul Intensif (France)
- Access to HPC resources of IDRIS 2025-AD011014834
Software
- Repository URL
- https://github.com/grgera/mist
- Programming language
- Python
- Development Status
- Active