Published October 2, 2023 | Version 1.0
Dataset Open

A multi-scale labeled dataset for boulder segmentation and navigation on small bodies

  • 1. Politecnico di Milano

Description

The capability to detect boulders on the surface of small bodies is beneficial for vision-based applications such as hazard detection during critical operations, safety quantification, autonomous planning of scientific operations, and autonomous navigation. This task, however, is challenging due to the wide assortment of irregular shapes, the characteristics of the boulders population, and the rapid variability in the illumination conditions. Moreover, the lack of publicly available labeled datasets damps the research about data-driven algorithms. The following dataset has been designed and made publicly available to tackle these challenges. Its purpose is twofold. First, from the lessons learned from previous datasets, to develop a multi-purpose, high-fidelity dataset with boulders scattered across the surface of a small body. Second, to exploit domain randomization, artificial noise addition, scaling, and post-processing, enabling the design of data-driven pipelines. 

The methodology used to generate the dataset is illustrated in the work "A multi-scale labeled dataset for boulder segmentation and navigation on small bodies" by Mattia Pugliatti and Michele Maestrini, presented at the 74th IAC (International Astronautical Congress), 2024, Baku, Azerbaijan.

The dataset contains the image-label pairs of 47502 samples, organized with the following structure: 

Dataset_PugliattiMaestrini_2023IAC
    --img
    --labels
    --masks

The dataset is comprised of 47502 samples. The "img" folder contains the input, 512x 512 grayscale images. The "labels" folder includes the .txt segmentation labels of the 15 most prominent boulders for each image detected with the methodology illustrated in the IAC paper. The "masks" dataset contains the segmentation masks for all image layers, with the values being encoded between 0 and 17 as uint8. The samples are named as XXXXXX_YYY. XXXXXX stands for the image's original ID during rendering. YYY corresponds to the sub-splits of the original image obtained at rendering: 

    001 - Top-Left crop
    002 - Top-Right crop
    003 - Bottom-Left crop
    004 - Bottom-right crop
    005 - Whole, resized

The file "10000_ub_2023-01-18 00.09.43.txt" contains all the values of the rendering inputs detailed in the IAC paper.

Files

Dataset_PugliattiMaestrini_2023IAC.zip

Files (7.3 GB)

Name Size Download all
md5:5da7960e8ce1dde35b42241d7cedeb83
7.3 GB Preview Download

Additional details

Funding

European Commission
Stardust-R - Stardust Reloaded 813644