Published April 13, 2026 | Version v1
Dataset Open

PlantAI – Part 1: Metadata and Primary Imagery (Right Camera)

  • 1. ROR icon Forestry Research Institute of Sweden
  • 2. Luleå Tekniska Universitet
  • 3. KTH Royal Institute of Technology
  • 4. The Swedish Forestry Research Institute

Description

Introduction

This dataset is part of the plantAI project and contains images and field inventory data of tree seedlings in Swedish forestry.

The dataset includes:

  • Field inventory data for individual seedlings
  • Site and environmental properties
  • Image metadata
  • Images from the right camera lens
  • Annotation of a subset of the right camera images

Images from the left camera lens are provided in a separate dataset. See the "Related works" section for links to associated datasets.

Dataset structure

The dataset is organized as follows:

  • Images are provided as zip archives containing approximately 3000 images each, ordered by photo timestamp.
  • seedling_observations.csv: 
    • The main data file containing seedling inventory data, image metadata, and site properties. Each row corresponds to one image.
    • All categorical variables encoded as numeric codes are supplemented by a corresponding "_label" column containing human-readable descriptions.
  • columns.csv:
    • Describes all columns in seedling_observations.csv, including type and definitions.
  • codebook.csv
    • Lists and describes all categorical codes used in the dataset.
  • labelstudio_annotation_photo_angles_2-5.zip
    • Annotations of a subset of the images
    • Only side views are represented, that is photo_angle 2-5. 
    • Region of interest as bounding box
    • Seedling as oriented bounding box
    • Seedling pose keypoints

Column details

Detailed column descriptions are provided in columns.csv. This section highlights selected variables.

Column: `planting_spot`

The `planting_spot` variable describes the type of soil preperation done at the planting location, based on visual judgement, and the types are defined as:

Code Description SWE Description ENG
1 Omvänd torva mineral Inverted turf with mineral soil
2 Omvänd torva utan mineral Inverted turf without mineral soil
3 Grop, högt läge Pit, elevated position
4 Grop, lågt läge Pit, low position
5 Gångjärn Hinge
6 Mineralfläck Mineral spot
7 Pytsning mineraljord Spread of mineral soil
8 Pytsning humus Spread of humus
9 Störd humus Disturbed humus
10 Körspår Track rut
11 Invers Inversion
12 Omarkberett No preparation
13 Odefinierad Undefined

Column: `site_preparation_method`

The `site_preparation_method` column describes the type of soil preparation done on the site as whole, and are defined as:

Code Description SWE Description ENG
0 Ingen None
1 Okänd Unknown
2 Harvning Scarification
3 Grävmaskin Excavator
4 Grävmaskin, utan entreprenör Excavator, non-entrepreneur
5 Grävmaskin med vissa spår, liknande harvning Excavator with some pulled tracks (similar to scarification)
6 Högläggning Mounding
7 Fläck, högläggning Spot mounding
8 Delvis högläggning Partial mounding

Soil types

The soil types are based on the Soil types 1:25 000–1:100 000 dated 2018-01-30, provided by SGU – Geological Survey of Sweden.

The GNSS position of each seedling was used to extract soil type information from two map layers:

  1. Parent Material – Base Layer (JG2): Represents the dominant soil type at ~0.5 m depth. This includes areas of exposed bedrock or thin soil layers. All data points have a value in this layer.
  2. Surface Layer – Thin or Discontinuous (JY1): Represents surface layers thinner than ~0.5 m, or discontinuous layers averaging 0.5–1 m. Common examples include thin peat or till on bedrock. If present, JY1 overlays JG2.

The following soil types appear in the dataset:

Code Description SWE Description ENG
5 Kärrtorv Fen peat
19 Postglacial finlera Postglacial clay
40 Glacial lera Glacial clay
48 Glacial silt Glacial silt
55 Isälvssediment, sand Glaciofluvial sediment, sand
75 Torv Peat
95 Sandig morän Sandy till
100 Morän Till
888 Berg Rock
890 Urberg Bedrock (Precambrian rock)

Label Studio annotation

The annotations are provided as exported from open source software Label Studio

Data quality and uncertainty

Several variables in the dataset are based on field observations and manual classification, including species, planting spot type, seedling vitality, and vegetation within the planting spot.

These variables are subject to observer interpretation and may include classification uncertainty. 

Position data is provided together with estimated horizontal and vertical errors in meters (as provided from the GNSS).

Methods (English)

Images and field data were collected using two identical Samsung Galaxy A52s smartphones (model SM-A528B/DS), mounted in a custom-built 3D-printed stereo camera rig with a fixed 95.1 mm lens separation. A synchronized shutter trigger was implemented using a reversed headset-button setup, allowing near-simultaneous image capture through a pinch gesture. The Open Camera app was used for image capture, enabling IMU-based orientation logging and support for headset-controlled shutter release. Data collection took place under varied weather conditions throughout the summer of 2022. Each seedling was photographed alongside an identifier rig featuring ArUco markers for automated vision-based tracking. Top-down images include a measurement frame that provides both scale and orientation reference. All images were linked to a unique seedling ID, which was used to associate image data with field measurements. Final metadata tables were constructed by combining field observations with environmental data from Skogsstyrelsen’s API, resulting in a unified dataset that includes both visual documentation and structured annotations.

Abstract (English)

This dataset was collected during the summer of 2023 as part of a forestry field study in southern Sweden. Data was gathered from 34 replanted clear-cut sites within three forest management areas operated by Södra, located near Norrköping, Kinna, and Växjö. A total of 2,780 seedlings were documented, resulting in 14,992 images. Each seedling was photographed from above and from the side in four different orientations.

The majority of seedlings are Norway spruce (Picea abies) and Scots pine (Pinus sylvestris), with a smaller number of silver birch (Betula pendula) and European larch (Larix decidua). Most (14,748) side-view images include a corresponding left-lens stereo image.

Seedling condition, height, and planting quality were assessed in the field by a single technician. Plant height was measured using a ruler, and planting angle was visually estimated with the aid of a protractor. A measurement frame placed in the top-down images provided scale and directional reference to support these assessments.

Site-level metadata, such as soil characteristics was retrieved via API from the Swedish Forest Agency (Skogsstyrelsen). Prior forest treatment was also added based on site preparation documents.

Files

codebook.csv

Files (37.6 GB)

Name Size Download all
md5:10b86626a8f0e1dc91a2f4668db0907f
2.5 kB Preview Download
md5:1b797cd06232c0c559204dfa608c8599
3.7 kB Preview Download
md5:86832707d4a4b2c8f19d7268c2c03d2e
7.5 GB Preview Download
md5:d72bc88aea1cdf4dee3be4778da2cb1f
7.6 GB Preview Download
md5:94d6101c599841ff33c58e128e795a97
7.4 GB Preview Download
md5:ea445db0b8ff504c8596d06d387f276c
7.6 GB Preview Download
md5:9855325a09825a9a4fd508234eb22dd5
7.5 GB Preview Download
md5:eafdd2f0a2c7452b9c9f8eb872913e59
5.1 MB Preview Download
md5:17a4ba36797021ec643985c27145a954
4.9 MB Preview Download

Additional details

Related works

Is supplemented by
Dataset: 10.5281/zenodo.19277183 (DOI)

Funding

Södra Skogsägarna (Sweden)
Södra Research Foundation
VINNOVA
Autonomous forest regeneration for a sustainable bioeconomy (AutoPlant) 2020-04202
VINNOVA
Autonomous forest regeneration for a sustainable bioeconomy (AutoPlant 3) 2023-02747
Foundation for Strategic Environmental Research
Mistra Digital Forest

Dates

Collected
2023-06-13/2023-08-04