Published August 20, 2024 | Version v1.0.0
Dataset Open

PULP-Dronet v3 dataset

  • 1. ROR icon University of Bologna
  • 2. ROR icon Technology Innovation Institute

Contributors

Hosting institution:

Project leader:

Supervisor:

  • 1. Swiss Federal Institute of Technology in Zurich
  • 2. Dalle Molle Institute for Artificial Intelligence Research
  • 3. ROR icon Technology Innovation Institute
  • 4. Eidgenössische Technische Hochschule Zürich
  • 5. Università degli Studi di Bologna

Description

The PULP-Dronet v3 dataset

The Himax dataset has been collected at University of Bologna and Technology Innovation Institute using a Himax ultra-low power, gray-scale, and QVGA camera mounted on a Bitcraze Crazyflie nano-drone. The dataset has been used for training and testing the PULP-Dronet v3 CNN, a neural network for autonomous visual-based navigation for nano-drones. This release includes the training and testing set described in the paper.

Resources


Dataset Description

We collected a dataset of 77k images for nano-drones' autonomous navigation, for a total of 600MB of data.
We used the Bitcraze Crazyflie 2.1, collecting images from the AI-Deck's Himax HM01B0 monocrome camera.

The images in the PULP-Dronet v3 dataset have the following characteristics:
  • Resolution: each image has a QVGA resolution of 324x244 pixels.
  • Color: all images are grayscale, so they have 1 single channel.
  • Format: the images are stored in .jpeg format.

A human pilot manually flew the drone, collecting i) images from the grayscale QVGA Himax camera sensor of the AI-deck, ii) the gamepad's yaw-rate, normalized in the [-1;+1] range, inputted from the human pilot, iii) the drone's estimated state,  and iv) the distance between obstacles and the drone measured by the front-looking ToF sensor.

After the data collection, we labeled all the images with a binary collision label whenever an obstacle was in the line of sight and closer than 2m. We recorded 301 sequences in 20 different environments. Each sequence of data is labeled with high-level characteristics, listed in characteristics.json:

For training our CNNs, we augmented the training images by applying random cropping, flipping, brightness augmentation, vignetting, and blur. The resulting dataset has 157k images, split as follows: 110k, 7k, 15k images for training, validation, and testing, respectively.

To address the labels' bias towards the center of the [-1;+1] yaw-rate range in our testing dataset, we balanced the dataset by selectively removing a portion of images that had a yaw-rate of 0. Specifically, we removed (only from the test set) some images having yaw_rate==0 and collision==1.
Dataset Train Images Validation Images Test Images Total
PULP-Dronet v3 53,830  7,798 15,790 77,418
PULP-Dronet v3 testing 53,830  7,798 3,071 64,699
PULP-Dronet v3 training 110,138 15,812 31,744 157,694
 
 
we use the `PULP-Dronet v3 training` for training and the  `PULP-Dronet v3 testing` for validation/testing, this is the final split:
 
Dataset Train Images Validation Images Test Images Total
Final   110,138  7,798 3,071 121,007 

Notes:
  • `PULP-Dronet v3` and `PULP-Dronet v3 testing` datasets: Images are in full QVGA resolution (324x244px), uncropped.
  • `PULP-Dronet v3 training` dataset: Images are cropped to 200x200px, matching the PULP-Dronet input resolution. Cropping was done randomly on the full-resolution images to create variations.

Dataset Structure

.
└── Dataset_PULP_Dronet_v3_*/
    ├── ETH finetuning/
    │       ├── acquisition1/
    │       │   ├── characteristics.json # metadata
    │       │   ├── images/ # images folder
    │       │   ├── labels_partitioned.csv # Labels for PULP-Dronet
    │       │   └── state_labels_DroneState.csv # raw data from the crazyflie
    |       ...
    │       └── acquisition39/
    ├── Lorenzo Bellone/
    │       ├── acquisition1/
    |       ...
    │       └── acquisition19/
    ├── Lorenzo Lamberti/
    │   ├── dataset-session1/
    |   │   ├── acquisition1/
    |   |   ...
    |   │   └── acquisition29/
    │   ├── dataset-session2/
    |   │   ├── acquisition1/
    |   |   ...
    |   │   └── acquisition55/
    │   ├── dataset-session3/
    |   │   ├── acquisition1/
    |   |   ...
    |   │   └── acquisition65/
    │   └── dataset-session4/
    |       ├── acquisition1/
    |       ...
    |       └── acquisition51/
    ├── Michal Barcis/
    │       ├── acquisition1/
    |       ...
    │       └── acquisition18/
    └── TII finetuning/
        ├── dataset-session1/
        │       ├── acquisition1/
        |       ...
        │       └── acquisition18/
        └── dataset-session2/
                ├── acquisition1/
                ...
                └── acquisition39/
This structure applies for all the three sets mentioned above: `PULP_Dronet_v3`, `PULP_Dronet_v3_training`, `PULP_Dronet_v3_testing`.

Dataset Labels

1. labels_partitioned.csv
The  file contains metadata for the PULP-Dronet v3 image dataset.
The file includes the following columns:
  • filename: The name of the image file (e.g., 25153.jpeg).
  • label_yaw_rate: The yaw rate label, representing the rotational velocity. values are in the [-1, +1] range, where YawRate > 0 means counter-clockwise turn --> turn left, and YawRate < 0 means clockwise turn --> turn right.
  • label_collision: The collision label, in range [0,1]. 0 denotes no collision and 1 indicates a collision.
  • partition: The dataset partition, i.e., train, test, or valid.
2. characteristics.json
contains metadata. This might be useful the user to filter the dataset on some specific characteristics, or to partition the images types equally:
  • scenario (i.e., indoor or outdoor);
  • path type (i.e., presence or absence of turns);
  • obstacle types (e.g., pedestrians, chairs);
  • flight height (i.e., 0.5, 1.0, 1.5 m/s);
  • behaviour in presence of obstacles (i.e., overpassing, stand still, n/a);
  • light conditions (dark, normal, bright, mixed);
  • a location name identifier;
  • acquisition date.
3. labeled_images.csv
the same as labels_partitioned.csv, but without the partition column. You can use this file to repeat the partition into train, valid, and test sets.

4. state_labels_DroneState.csv
This is the raw data logged from the crazyflie at ~100 samples/s.
The file includes the following columns:
  • timeTicks: The timestamp.
  • range.front: The distance measurement from the front VL53L1x ToF sensor [mm].
  • mRange.rangeStatusFront: The status code of the front range sensor (check the VL53L1x datasheet for more info)
  • controller.yawRate: The yaw rate command given by the human pilot (in radians per second).
  • ctrltarget.yaw: The target yaw rate set by the control system (in radians per second).
  • stateEstimateZ.rateYaw: The estimated yaw rate from the drone's state estimator (in radians per second).

Data Processing Workflow

You can find the scripts at pulp-platform/pulp-dronet
dataset_processing.py:
  • Input: state_labels_DroneState.csv
  • Output: labeled_images.csv
  • Function: matches the drone state labels (~100Hz) timestamp to the image's timestamp (~10Hz), discarding extra drone states.
dataset_partitioning.py:
  • Input: labeled_images.csv
  • Output: labels_partitioned.csv
  • Function: Partitions the labeled images into training, validation, and test sets.

License

We release this dataset as open source under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, see LICENSE.CC.md.

Files

Dataset_PULP_Dronet_v3.zip

Files (2.3 GB)

Name Size Download all
md5:3aefdd526ab291aaa2ac01c0b4c0be47
434.4 MB Preview Download
md5:e2fd204c258e246f55b4a20832a9317e
1.9 GB Preview Download

Additional details

Dates

Updated
2024-07
dataset

Software

Repository URL
https://github.com/pulp-platform/pulp-dronet
Programming language
Python
Development Status
Active