3D IQ Test Task (3D-IQTT) - A Dataset for Quantitative Evaluation of 3D Reconstruction from 2D Images
Authors/Creators
- 1. Mila
- 2. McGill
- 3. ElementAI
Description
3D reconstruction is mostly evaluated qualitatively. With this dataset, we are introducing a new difficult quantitative task, the 3D IQ test task (3D-IQTT).
It is designed to be similar to mental rotation questions found in some IQ tests. Each element in the dataset consists of 4 images: reference object and answers 1-3. One of the answers is the reference object but randomly rotated. For every question, dataset users have to use their model to pick the rotated model out of the 3 possible answers.
The dataset encourages semi-supervised or unsupervised 3D reconstruction because it contains a large corpus of unlabeled data and only a small set of labeled data where the correct answer is known.
All the images are of blocky 3D shapes floating in space in front of a black background.
Demo scripts for loading/processing the dataset can be found at https://github.com/fgolemo/3D-IQTT
The dataset consists of:
-
3diqtt-v2-train.h5 (XZ-compressed)
(Training Dataset)-
/labeled
-
/questions
format: [10,000 x 4 x 128 x 128 x 3], corresponding to (10k items) x (reference + 3 answers) x (img width) x (img height) x (RGB), np.float32 in range [0,1] -
/answers
format: [10,000], corresponding to (10k answers), np.uint8, one of the following three items: [0,1,2]
-
-
/unlabeled
-
/questions
format: [100,000 x 4 x 128 x 128 x 3], corresponding to (100k items) x (reference + 3 answers) x (img width) x (img height) x (RGB), np.float32 in range [0,1]
-
-
-
3diqtt-v2-test.h5
(Test Dataset)-
/questions
format: [10,000 x 4 x 128 x 128 x 3], corresponding to (10k items) x (reference + 3 answers) x (img width) x (img height) x (RGB), np.float32 in range [0,1].
Important! This is what you have to evaluate yourself on. We have the correct answers but they are not public.
-
-
3diqtt-v2-val.h5
(Validation Dataset)-
/questions
format: [10,000 x 4 x 128 x 128 x 3], corresponding to (10k items) x (reference + 3 answers) x (img width) x (img height) x (RGB), np.float32 in range [0,1] -
/answers
format [10,000], corresponding to (10k answers), np.uint8, one of the following three items: [0,1,2]
-
Important: Before use, the main training dataset (3diqtt-v2-train.h5.xz) needs to be decompressed. This can take up to 24h depending on your hardware. We apologize for any inconvenience caused by this. The uncompressed file has a size of ~74GB. The reason for this compression was a restriction on the size of individual files. The command for decompression is "unxz 3diqtt-v2-train.h5.xz" on Unix machines.
If you use this dataset, please cite it.
Notes
Files
Files
(61.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:c22b4dc9e7c3eba9cbd38051f4e995de
|
7.9 GB | Download |
|
md5:e1162bab0eae078524e3f0c32ff87101
|
46.0 GB | Download |
|
md5:ece1705bf8b7425058f630c20855615f
|
7.9 GB | Download |