There is a newer version of the record available.

Published June 8, 2021 | Version 1.0
Dataset Open

PASS: An ImageNet replacement for self-supervised pretraining without humans

  • 1. University of Oxford

Description

Computer vision has long relied on ImageNet and other large datasets of images sampled from the Internet for pretraining models. However, these datasets have ethical and technical shortcomings, such as containing personal information taken without consent, unclear license usage, biases, and, in some cases, even problematic image content. On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be necessary, or perhaps not even optimal, for model pretraining. We thus propose an unlabelled dataset PASS: Pictures without humAns for Self-Supervision. PASS only contains images with CC-BY license and complete attribution metadata, addressing the copyright issue. Most importantly, it contains no images of people at all, and also avoids other types of images that are problematic for data protection or ethics. We show that PASS can be used for pretraining with methods such as MoCo-v2, SwAV and DINO. In the transfer learning setting, it yields similar downstream performances to ImageNet pretraining even on tasks that involve humans, such as human pose estimation. PASS does not make existing datasets obsolete, as for instance it is insufficient for benchmarking. However, it shows that model pretraining is often possible while using safer data, and it also provides the basis for a more robust evaluation of pretraining methods.

A simple download script is here: https://github.com/yukimasano/PASS/blob/main/download.sh
Visit our webpage here: https://www.robots.ox.ac.uk/~vgg/research/pass/

Notes

Note: tar archives can be used / unzipped individually.

Files

pass_metadata.csv

Files (179.7 GB)

Name Size Download all
md5:a19eb0894e457d4e9c2b5517bbde8d7d
18.7 GB Download
md5:c5769de320f4cfb5623a1bed55995091
18.7 GB Download
md5:70b54214c7f2cc57f6069a275f375354
18.7 GB Download
md5:b647812c558da3ab45f5cab4f866c91f
18.7 GB Download
md5:451d1355a3fc8a3a742ed83d10380508
18.7 GB Download
md5:af6a7cf2644650ca56a6669b22c4f3ee
18.7 GB Download
md5:4fcb80fa597730799bd5419ce99124de
18.7 GB Download
md5:51ee83dac260ca11e8059ff33b3fdad1
18.7 GB Download
md5:882b7f10c14a67d3fe47624b2340ea49
18.7 GB Download
md5:e978af85e64d10f55f7e2710757c2234
11.3 GB Download
md5:b2d8938da2ddfcaecf93d1af81f0ec0a
135.7 MB Preview Download