Danish Roadside Dataset for Camera-Assisted Monitoring of Invasive Plant Species
Description
This repository provides a unique dataset designed to evaluate and benchmark high-throughput image classification and object detection methods. The dataset includes high-resolution images captured by a high-speed camera mounted on a car, oriented perpendicular to the direction of travel along roadsides in Denmark.
Overview
The dataset consists of 14,838 high-resolution images (4024x3036 pixels) taken along Danish roadsides. Developed to monitor the spread of invasive plant species, it focuses on six meta-species with consolidated classes for similar sibling species.
Purpose
The primary purpose of this dataset is to facilitate the detection and classification of 6 meta-species of invasive plants, allowing efficient monitoring and control efforts in Denmark.
Classes (Meta-Species)
Due to the visual similarity within certain genera, sibling species are grouped into single classes or “meta-species.” The dataset focuses on the following 6 distinct taxa:
1. Cytisus scoparius (L.) Link
2. Lupinus polyphyllus Lindl.
3. Pastinaca sativa L.
4. Reynoutria spp. (Reynoutria japonica Houtt. and Reynoutria sachalinensis (F.Schmidt) Nakai)
5. Rosa rugosa Thunb.
6. Solidago spp. (Solidago canadensis L. and Solidago gigantea Aiton)
Dataset Structure
The dataset is structured into two types:
• Monolabel images: Images with a single species, organized by class in separate folders (6 classes for 6 meta-species).
• Multilabel images: Images that may contain multiple species, with annotations provided in annotations_mads_split_multilabel.csv.
Image Splits
• Monolabel Images:
• Train: 10,302 images
• Validation: 2,160 images
• Test: 2,140 images
• Multilabel Images:
• 206 images
• Quick Test Sets:
• val_16: 16 images
• test_14: 14 images
Note: These 30 images are duplicated in the main folders for quick testing purposes.
Image Resolution
All images have a resolution of 4024x3036 pixels, providing high-detail views suitable for robust classification.
Annotations
• Monolabel Images: Organized by species in folders.
• Multilabel Images: Annotation data provided in annotations_mads_split_multilabel.csv.
Due to Zenodo’s storage constraints, the image quality in this dataset has been reduced to 70% quality. However, original-quality images can be accessed here:
https://lab.plantnet.org/DanishRoadsData/0_datastore.tar
In addition, this link provides access to:
• Deep Features and Embeddings: A deep_features folder containing precomputed PyTorch tensors for each image, organized to match the image directory structure. These tensors represent deep embeddings, which save processing time and provide consistent feature extraction. Each high-resolution image was divided into 926 tiles across eight scales (described in the table below). Embeddings were extracted using a Vision Transformer (ViT) model, specifically Beit (version 1) fine-tuned on PlantNet, with an input resolution of 384x384. The final embedding was obtained by averaging the local tokens from each tile.
Tile Scale Distribution
Scale Resized Image (px) n tiles X n tiles Y n tiles Total
1 384 2 1 2
2 768 4 3 12
3 1152 7 5 35
4 1536 10 7 70
5 1920 12 9 108
6 2304 15 11 165
7 2688 18 13 234
8 3036 20 15 300
Total - - - 926
Usage
This dataset supports:
• Training and testing of machine learning models for invasive plant detection and classification.
• Development of monitoring tools for tracking the spread of invasive species in various regions.
Files
Files
(46.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1f68ae54ef20e4407b08db300fbdd801
|
46.1 GB | Download |