Published October 30, 2024 | Version v1
Image Open

Danish Roadside Dataset for Camera-Assisted Monitoring of Invasive Plant Species

  • 1. AI Lab ApS
  • 2. ROR icon Aarhus University

Description

This repository provides a unique dataset designed to evaluate and benchmark high-throughput image classification and object detection methods. The dataset includes high-resolution images captured by a high-speed camera mounted on a car, oriented perpendicular to the direction of travel along roadsides in Denmark.

 

Overview

 

The dataset consists of 14,838 high-resolution images (4024x3036 pixels) taken along Danish roadsides. Developed to monitor the spread of invasive plant species, it focuses on six meta-species with consolidated classes for similar sibling species.

 

Purpose

 

The primary purpose of this dataset is to facilitate the detection and classification of 6 meta-species of invasive plants, allowing efficient monitoring and control efforts in Denmark.

 

Classes (Meta-Species)

 

Due to the visual similarity within certain genera, sibling species are grouped into single classes or “meta-species.” The dataset focuses on the following 6 distinct taxa:

 

1. Cytisus scoparius (L.) Link

2. Lupinus polyphyllus Lindl.

3. Pastinaca sativa L.

4. Reynoutria spp. (Reynoutria japonica Houtt. and Reynoutria sachalinensis (F.Schmidt) Nakai)

5. Rosa rugosa Thunb.

6. Solidago spp. (Solidago canadensis L. and Solidago gigantea Aiton)

 

Dataset Structure

 

The dataset is structured into two types:

 

Monolabel images: Images with a single species, organized by class in separate folders (6 classes for 6 meta-species).

Multilabel images: Images that may contain multiple species, with annotations provided in annotations_mads_split_multilabel.csv.

 

Image Splits

 

Monolabel Images:

Train: 10,302 images

Validation: 2,160 images

Test: 2,140 images

Multilabel Images:

• 206 images

Quick Test Sets:

val_16: 16 images

test_14: 14 images

Note: These 30 images are duplicated in the main folders for quick testing purposes.

 

Image Resolution

 

All images have a resolution of 4024x3036 pixels, providing high-detail views suitable for robust classification.

 

Annotations

 

Monolabel Images: Organized by species in folders.

Multilabel Images: Annotation data provided in annotations_mads_split_multilabel.csv.

 

Due to Zenodo’s storage constraints, the image quality in this dataset has been reduced to 70% quality. However, original-quality images can be accessed here:

https://lab.plantnet.org/DanishRoadsData/0_datastore.tar

In addition, this link provides access to:

Deep Features and Embeddings: A deep_features folder containing precomputed PyTorch tensors for each image, organized to match the image directory structure. These tensors represent deep embeddings, which save processing time and provide consistent feature extraction. Each high-resolution image was divided into 926 tiles across eight scales (described in the table below). Embeddings were extracted using a Vision Transformer (ViT) model, specifically Beit (version 1) fine-tuned on PlantNet, with an input resolution of 384x384. The final embedding was obtained by averaging the local tokens from each tile.

Tile Scale Distribution

Scale Resized Image (px) n tiles X n tiles Y n tiles Total

1 384 2 1 2

2 768 4 3 12

3 1152 7 5 35

4 1536 10 7 70

5 1920 12 9 108

6 2304 15 11 165

7 2688 18 13 234

8 3036 20 15 300

Total - - - 926

 

Usage

This dataset supports:

 • Training and testing of machine learning models for invasive plant detection and classification.

 • Development of monitoring tools for tracking the spread of invasive species in various regions.

 

Files

Files (46.1 GB)

Name Size Download all
md5:1f68ae54ef20e4407b08db300fbdd801
46.1 GB Download