Published June 10, 2022 | Version 1.0
Dataset Open

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - STL10 - Raw Datasets

  • 1. AIML Lab, University of St.Gallen
  • 2. AI Lab Montreal, Samsung Advanced Institute of Technology
  • 3. Image Processing Group, Universitat Politècnica de Catalunya

Description

Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from STL10. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains the raw model zoos as collections of models (file names beginning with "cifar_"). Zoos are trained with small and large CNN models, in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). Due to the large filesize, the preprocessed datasets are hosted in a separate repository. The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Notes

This dataset is part of a larger collection of model zoo datasets. Further information and references to all zoos can be found on www.modelzoos.cc

Files

index_dict_large.json

Files (32.6 GB)

Name Size Download all
md5:fa003be90f91dd98505407304470374e
741 Bytes Preview Download
md5:7b8b159345676adc3b208d3a8fcc4cd0
733 Bytes Preview Download
md5:003cd2142b18134fd5051d960a67f2f8
11.1 GB Preview Download
md5:d1ef25b010564ea1ec2436d277e34993
11.1 GB Preview Download
md5:90a650f4cd9fa791bc1b354ed97a85dc
2.4 GB Preview Download
md5:5cd2888b7c19f3d5ce77667b23c1f0fd
3.6 GB Preview Download
md5:bb4db0f5f4d278fbcc2f0194a7e3de3b
3.6 GB Preview Download
md5:6b81b38ffdf767f317210ab3f68d9753
767.1 MB Preview Download