Label Memorization of CIFAR-10N

Lim, Yi Yang Gordon

doi:10.5281/zenodo.14687826

Published January 18, 2025 | Version v1

Dataset Open

Label Memorization of CIFAR-10N

Lim, Yi Yang Gordon (Data collector)¹

1. University of Michigan–Ann Arbor

Contributors

Data collector:

Lim, Gordon

These files contain results from running heldout estimation on the CIFAR-10N dataset with the rand1 noise type, using 1500 ResNet models. The cifar-rand1-human-infl-mem.npz file contains results for human noisy labels, while cifar-rand1-syn-infl-mem.npz contains results for synthetic noisy labels generated using the same class transition matrix.

Dictionary Keys

1. `total_runs`

Type: Integer
Description: The total number of training runs included in this aggregated result.

2. `trainset_mask`

Type: array (Boolean, shape: [total_runs, train_size])
Description: A mask indicating which training examples were used (True) or held out (False) during each training run.

3. `trainset_correctness`

Key: trainset_correctness
Type: array (Boolean, shape: [total_runs, train_size])
Description: Whether the model correctly predicted the label for each training example during each run.

4. `trainset_predictions`

Type: array (Integer, shape: [total_runs, train_size])
Description: The predicted class labels for each training example during each run.

5. `testset_correctness`

Type: array (Boolean, shape: [total_runs, test_size])
Description: Whether the model correctly predicted the label for each test example during each run.

6. `testset_predictions`

Type: array (Integer, shape: [total_runs, test_size])
Description: The predicted class labels for each test example during each run.

7. `memorization`

Type: array (Float, shape: [train_size])
Description: Memorization score of each training example, computed across all runs.

8. `influence`

Type: array (Float, shape: [test_size, train_size])
Description: Influence scores of each training example on each test example.

9. `memorization_inclusion_prob`

Type: array (Float, shape: [train_size])
Description: Probability that the training example is predicted correctly when included.

10. `memorization_exclusion_prob`

Type: array (Float, shape: [train_size])
Description: Probability that the training example is predicted correctly when excluded.

Usage

To load the file and access its contents:

import numpy as np

# Load the file
data = np.load('cifar-rand1-human-agg-infl-mem.npz')

# Access individual components
total_runs = data['total_runs']
trainset_mask = data['trainset_mask']
memorization = data['memorization']

Notes

Our results were aggregated over 1500 training runs.

For more details or questions, feel free to reach out!

Files

Files (5.8 GB)

Name	Size	Download all
cifar-rand1-human-infl-mem.npz md5:09fbb3f6b23725dbc1461ac8f8d98286	2.9 GB	Download
cifar-rand1-syn-infl-mem.npz md5:8f472aef4a487d43289657ae53865871	2.9 GB	Download

Additional details

Repository URL: https://github.com/gordon-lim/soft-neighbor-labels

	All versions	This version
Views	33	33
Downloads	27	27
Data volume	83.7 GB	83.7 GB

Label Memorization of CIFAR-10N

Creators

Contributors

Data collector:

Description

Dictionary Keys

1. total_runs

2. trainset_mask

3. trainset_correctness

4. trainset_predictions

5. testset_correctness

6. testset_predictions

7. memorization

8. influence

9. memorization_inclusion_prob

10. memorization_exclusion_prob

Usage

Notes

For more details or questions, feel free to reach out!

Files

Files (5.8 GB)

Additional details

Software

1. `total_runs`

2. `trainset_mask`

3. `trainset_correctness`

4. `trainset_predictions`

5. `testset_correctness`

6. `testset_predictions`

7. `memorization`

8. `influence`

9. `memorization_inclusion_prob`

10. `memorization_exclusion_prob`