Published January 18, 2025
| Version v1
Dataset
Open
Label Memorization of CIFAR-10N
Contributors
Data collector:
Description
These files contain results from running heldout estimation on the CIFAR-10N dataset with the rand1
noise type, using 1500 ResNet models. The cifar-rand1-human-infl-mem.npz
file contains results for human noisy labels, while cifar-rand1-syn-infl-mem.npz
contains results for synthetic noisy labels generated using the same class transition matrix.
Dictionary Keys
1. total_runs
- Type: Integer
- Description: The total number of training runs included in this aggregated result.
2. trainset_mask
- Type:
array
(Boolean, shape:[total_runs, train_size]
) - Description: A mask indicating which training examples were used (
True
) or held out (False
) during each training run.
3. trainset_correctness
- Key:
trainset_correctness
- Type:
array
(Boolean, shape:[total_runs, train_size]
) - Description: Whether the model correctly predicted the label for each training example during each run.
4. trainset_predictions
- Type:
array
(Integer, shape:[total_runs, train_size]
) - Description: The predicted class labels for each training example during each run.
5. testset_correctness
- Type:
array
(Boolean, shape:[total_runs, test_size]
) - Description: Whether the model correctly predicted the label for each test example during each run.
6. testset_predictions
- Type:
array
(Integer, shape:[total_runs, test_size]
) - Description: The predicted class labels for each test example during each run.
7. memorization
- Type:
array
(Float, shape:[train_size]
) - Description: Memorization score of each training example, computed across all runs.
8. influence
- Type:
array
(Float, shape:[test_size, train_size]
) - Description: Influence scores of each training example on each test example.
9. memorization_inclusion_prob
- Type:
array
(Float, shape:[train_size]
) - Description: Probability that the training example is predicted correctly when included.
10. memorization_exclusion_prob
- Type:
array
(Float, shape:[train_size]
) - Description: Probability that the training example is predicted correctly when excluded.
Usage
To load the file and access its contents:
import numpy as np
# Load the file
data = np.load('cifar-rand1-human-agg-infl-mem.npz')
# Access individual components
total_runs = data['total_runs']
trainset_mask = data['trainset_mask']
memorization = data['memorization']
Notes
- Our results were aggregated over 1500 training runs.
For more details or questions, feel free to reach out!
Files
Files
(5.8 GB)
Name | Size | Download all |
---|---|---|
md5:09fbb3f6b23725dbc1461ac8f8d98286
|
2.9 GB | Download |
md5:8f472aef4a487d43289657ae53865871
|
2.9 GB | Download |
Additional details
Software
- Repository URL
- https://github.com/gordon-lim/soft-neighbor-labels