Published January 18, 2025 | Version v1
Dataset Open

Label Memorization of CIFAR-10N

  • 1. ROR icon University of Michigan–Ann Arbor

Contributors

Data collector:

Description

These files contain results from running heldout estimation on the CIFAR-10N dataset with the rand1 noise type, using 1500 ResNet models. The cifar-rand1-human-infl-mem.npz file contains results for human noisy labels, while cifar-rand1-syn-infl-mem.npz contains results for synthetic noisy labels generated using the same class transition matrix.

 

Dictionary Keys

1. total_runs

  • Type: Integer
  • Description: The total number of training runs included in this aggregated result.

2. trainset_mask

  • Type: array (Boolean, shape: [total_runs, train_size])
  • Description: A mask indicating which training examples were used (True) or held out (False) during each training run.

3. trainset_correctness

  • Key: trainset_correctness
  • Type: array (Boolean, shape: [total_runs, train_size])
  • Description: Whether the model correctly predicted the label for each training example during each run.

4. trainset_predictions

  • Type: array (Integer, shape: [total_runs, train_size])
  • Description: The predicted class labels for each training example during each run.

5. testset_correctness

  • Type: array (Boolean, shape: [total_runs, test_size])
  • Description: Whether the model correctly predicted the label for each test example during each run.

6. testset_predictions

  • Type: array (Integer, shape: [total_runs, test_size])
  • Description: The predicted class labels for each test example during each run.

7. memorization

  • Type: array (Float, shape: [train_size])
  • Description: Memorization score of each training example, computed across all runs.

8. influence

  • Type: array (Float, shape: [test_size, train_size])
  • Description: Influence scores of each training example on each test example.

9. memorization_inclusion_prob

  • Type: array (Float, shape: [train_size])
  • Description: Probability that the training example is predicted correctly when included.

10. memorization_exclusion_prob

  • Type: array (Float, shape: [train_size])
  • Description: Probability that the training example is predicted correctly when excluded.

Usage

To load the file and access its contents:

import numpy as np

# Load the file
data = np.load('cifar-rand1-human-agg-infl-mem.npz')

# Access individual components
total_runs = data['total_runs']
trainset_mask = data['trainset_mask']
memorization = data['memorization']

Notes

  • Our results were aggregated over 1500 training runs.

For more details or questions, feel free to reach out!

Files

Files (5.8 GB)

Name Size Download all
md5:09fbb3f6b23725dbc1461ac8f8d98286
2.9 GB Download
md5:8f472aef4a487d43289657ae53865871
2.9 GB Download

Additional details