Towards Explainability in Monocular Depth Estimation
Authors/Creators
- 1. Democritus University of Thrace School of Engineering
Description
VDC_RelSize
The Visual Depth Cue Dataset (VDC) is a synthetic image dataset which focuses on monocular depth estimation tasks. VDC is inspired by the research of Cutting & Vishton about the human depth perception system.
Relative Size
Τhe real world is perceived with perspective, which causes objects that are closer to the observer to appear larger than those that are farther away. As a result, the visual depth cue of relative size is always present.
Number of images
The dataset consists of 23,839 RGB images, each paired with a corresponding depth map, totaling 23,839 depth maps.
Image Resolution
The pairs of 8-bit RGB images and their corresponding 16-bit depth maps have a resolution of 1920 x 1080 pixels.
RGB image: A 8-bit depth is used per channel in a PNG image, the maximum pixel value for each color channel is 255 (2^8-1). This means that each color channel can have a value between 0 and 255, resulting in a total of 24 bits per pixel (8 bits per channel for red, green, and blue).
Depth map: A 16-bit PNG image has a maximum pixel value of 65535 (2^16-1). This means that each pixel in the image can have a value between 0 and 65535, where 0 represents black and 65535 represents white in a grayscale image.
Format
PNG is the file format used for both the RGB images and their corresponding depth maps.
Channels
- RGB images (3 channels)
- Grayscale depth maps (1 channel)
Backgrounds
All the images have a uniform, solid white background.
Objects
The images feature solid cylinders in black, taking inspiration from Nagata.
Object Variations
The objects visible in each scene fall into one of the following categories:
- 0.2,
- 1,
- 5, or
- 30
meters in size. Every scene contains at least one object from one of these categories.
Object placements
If we consider the horizontal direction as X, the vertical direction as Y, and the depth direction as Z, then the objects are located as follows:
-
X axis: random
-
Y axis: 0 (ground)
-
Z axis: predefined
| size [meters] | placement range [meters] |
|---|---|
| 0.2 | [0.5, 3] |
| 1 | [2.5, 30] |
| 5 | [13, 300] |
| 30 | [80, 600] |
Lighting conditions
The scenes are rendered without any lighting and only light emission materials are used to provide illumination, which is commonly known as "emission only lighting."
Noise
The images are free from any kind of noise addition.
Source code
The images in the dataset are generated by Python code that is executed within the Blender environment.
Citing
If you use this dataset in your work, please cite the following paper:
Arampatzakis, V., Pavlidis, G., Pantoglou, K., Mitianoudis, N., Papamarkos, N. (2025). Towards Explainability in Monocular Depth Estimation. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2134. Springer, Cham. https://doi.org/10.1007/978-3-031-74627-7_34
Licensing
This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. You are free to share and adapt the dataset for any purpose, including commercial use, as long as you provide appropriate credit to the authors. For more details, see the full terms at https://creativecommons.org/licenses/by/4.0/.
Files
0.2.zip
Files
(2.9 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:0d080413ec55257545750b53d065c8da
|
1.0 GB | Preview Download |
|
md5:296a698bff12b33aad97be1043924385
|
633.9 MB | Preview Download |
|
md5:4c16371ca6eea3d26ebae45fc146f773
|
803.7 MB | Preview Download |
|
md5:b3780d55dad808f5b5c89d8af7ec1272
|
419.8 MB | Preview Download |
|
md5:b780e6822ad6abd9a518e654e193733b
|
27.1 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: 10.1007/978-3-031-74627-7_34 (DOI)
Dates
- Accepted
-
2023-07-14