Published November 30, 2024 | Version 1
Dataset Open

Pre-processed (in Detectron2 and YOLO format) planetary images and boulder labels collected during the BOULDERING Marie Skłodowska-Curie Global fellowship

  • 1. ROR icon Stanford University
  • 2. ROR icon University of Oslo
  • 3. Ponoma University

Description

This database contains 4976 planetary images of boulder fields located on Earth, Mars and Moon. The data was collected during the BOULDERING Marie Skłodowska-Curie Global fellowship between October 2021 and 2024. The data was already splitted into train, validation and test datasets, but feel free to re-organize the labels at your convenience. 

For each image, all of the boulder outlines within the image were carefully mapped in QGIS. More information about the labelling procedure can be found in the following manuscript (https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2023JE008013). This dataset differs from the previous dataset included along with the manuscript https://zenodo.org/records/8171052, as it contains more mapped images, especially of boulder populations around young impact structures on the Moon (cold spots). In addition, the boulder outlines were also pre-processed so that it can be ingested directly in YOLOv8.

A description of what is what is given in the README.txt file (in addition in how to load the custom datasets in Detectron2 and YOLO). Most of the other files are mostly self-explanatory. Please see previous dataset or manuscript for more information. If you want to have more information about specific lunar and martian planetary images, the IDs of the images are still available in the name of the file. Use this ID to find more information (e.g., M121118602_00875_image.png, ID M121118602 ca be used on https://pilot.wr.usgs.gov/). I will also upload the raw data from which this pre-processed dataset was generated (see https://zenodo.org/records/14250970).

Thanks to this database, you can easily train a Detectron2 Mask R-CNN or YOLO instance segmentation models to automatically detect boulders. 

How to cite:

Please refer to the "how to cite" section of the readme file of https://github.com/astroNils/YOLOv8-BeyondEarth.

Structure:

.
└── boulder2024/
  ├── jupyter-notebooks/
  │  └── REGISTERING_BOULDER_DATASET_IN_DETECTRON2.ipynb
  ├── test/
  │  └── images/
  │    ├── <image_name>_image.png
  │    ├── ...
  │  └── labels/
  │    ├── <image_name>_image.txt
  │    ├── ...
  ├── train/
  │  └── images/
  │    ├── <image_name>_image.png
  │    ├── ...
  │  └── labels/
  │    ├── <image_name>_image.txt
  │    ├── ...
  ├── validation/
  │  └── images/
  │    ├── <image_name>_image.png
  │    ├── ...
  │  └── labels/
  │    ├── <image_name>_image.txt
  │    ├── ...
  ├── detectron2_inst_seg_boulder_dataset.json
  ├── README.txt
  ├── yolo_inst_seg_boulder_dataset.yaml

 

detectron2_inst_seg_boulder_dataset.json

is a json file containing the masks as expected by Detectron2 (see https://detectron2.readthedocs.io/en/latest/tutorials/datasets.html for more information on the format). In order to use this custom dataset, you need to register the dataset before using it in the training. There is an example how to do that in the jupyter-notebooks folder. You need to have detectron2, and all of its depedencies installed.  

yolo_inst_seg_boulder_dataset.yaml

can be used as it is, however you need to update the paths in the .yaml file, to the test, train and validation folders. More information about the YOLO format can be found here (https://docs.ultralytics.com/datasets/segment/).

Files

bouldering_dataset_2024_YOLO_and_detectron2_format.zip

Files (601.4 MB)

Additional details

Funding

European Commission
BOULDERING - A Deep Learning approach for boulder detection –The key to understand planetary surfaces evolution and their crater statistics-based ages 101030364

Dates

Available
2024-11

Software

Programming language
Python