Towards Explainability in Monocular Depth Estimation

Arampatzakis, Vasileios; Pavlidis, George; Pantoglou, Kyriakos; Mitianoudis, Nikolaos; Papamarkos, Nikos

doi:10.5281/zenodo.14755931

Published January 1, 2025 | Version v2

Conference paper Open

Towards Explainability in Monocular Depth Estimation

1. Democritus University of Thrace School of Engineering

VDC_RelSize

The Visual Depth Cue Dataset (VDC) is a synthetic image dataset which focuses on monocular depth estimation tasks. VDC is inspired by the research of Cutting & Vishton about the human depth perception system.

Relative Size

Τhe real world is perceived with perspective, which causes objects that are closer to the observer to appear larger than those that are farther away. As a result, the visual depth cue of relative size is always present.

Number of images

The dataset consists of 23,839 RGB images, each paired with a corresponding depth map, totaling 23,839 depth maps.

Image Resolution

The pairs of 8-bit RGB images and their corresponding 16-bit depth maps have a resolution of 1920 x 1080 pixels.

RGB image: A 8-bit depth is used per channel in a PNG image, the maximum pixel value for each color channel is 255 (2^8-1). This means that each color channel can have a value between 0 and 255, resulting in a total of 24 bits per pixel (8 bits per channel for red, green, and blue).

Depth map: A 16-bit PNG image has a maximum pixel value of 65535 (2^16-1). This means that each pixel in the image can have a value between 0 and 65535, where 0 represents black and 65535 represents white in a grayscale image.

Format

PNG is the file format used for both the RGB images and their corresponding depth maps.

Channels

RGB images (3 channels)
Grayscale depth maps (1 channel)

Backgrounds

All the images have a uniform, solid white background.

Objects

The images feature solid cylinders in black, taking inspiration from Nagata.

Object Variations

The objects visible in each scene fall into one of the following categories:

0.2,
1,
5, or
30

meters in size. Every scene contains at least one object from one of these categories.

Object placements

If we consider the horizontal direction as X, the vertical direction as Y, and the depth direction as Z, then the objects are located as follows:

X axis: random
Y axis: 0 (ground)
Z axis: predefined

size [meters]	placement range [meters]
0.2	[0.5, 3]
1	[2.5, 30]
5	[13, 300]
30	[80, 600]

Lighting conditions

The scenes are rendered without any lighting and only light emission materials are used to provide illumination, which is commonly known as "emission only lighting."

Noise

The images are free from any kind of noise addition.

Source code

The images in the dataset are generated by Python code that is executed within the Blender environment.

Citing

If you use this dataset in your work, please cite the following paper:

Arampatzakis, V., Pavlidis, G., Pantoglou, K., Mitianoudis, N., Papamarkos, N. (2025). Towards Explainability in Monocular Depth Estimation. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2134. Springer, Cham. https://doi.org/10.1007/978-3-031-74627-7_34

Licensing

This dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. You are free to share and adapt the dataset for any purpose, including commercial use, as long as you provide appropriate credit to the authors. For more details, see the full terms at https://creativecommons.org/licenses/by/4.0/.

Files

0.2.zip

Files (2.9 GB)

Name	Size	Download all
0.2.zip md5:0d080413ec55257545750b53d065c8da	1.0 GB	Preview Download
1.zip md5:296a698bff12b33aad97be1043924385	633.9 MB	Preview Download
30.zip md5:4c16371ca6eea3d26ebae45fc146f773	803.7 MB	Preview Download
5.zip md5:b3780d55dad808f5b5c89d8af7ec1272	419.8 MB	Preview Download
Abstract.pdf md5:b780e6822ad6abd9a518e654e193733b	27.1 kB	Preview Download

Additional details

Is supplement to: Conference paper: 10.1007/978-3-031-74627-7_34 (DOI)

Accepted: 2023-07-14

	All versions	This version
Views	147	114
Downloads	180	150
Data volume	93.1 GB	93.1 GB

VDC_RelSize

Relative Size

Number of images

Image Resolution

Format

Channels

Backgrounds

Objects

Object Variations

Object placements

Lighting conditions

Noise

Source code

Citing

Licensing

0.2.zip

Files (2.9 GB)

Related works

Dates

Towards Explainability in Monocular Depth Estimation

Authors/Creators

Description

VDC_RelSize

Relative Size

Number of images

Image Resolution

Format

Channels

Backgrounds

Objects

Object Variations

Object placements

Lighting conditions

Noise

Source code

Citing

Licensing

Files

0.2.zip

Files (2.9 GB)

Additional details

Related works

Dates