Published August 16, 2024
| Version v1
Dataset
Restricted
FOR-species20K dataset and code for review
Authors/Creators
Description
Description
Data and code (code.zip) corresponding to the manuscript entitled "Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset"
Code
The code folder contains all of the code to train and predict using the methods benchmarked in the manuscript
Data split and usage
The data is split into:
- Development data (dev): these includes 90% of the trees in the dataset and consists of individual tree point clouds (*.laz) named according to the treeID column available in the tree_metadata_dev.csv file, from which tree_species labels are available. These data are meant to be used for model development and can thus be further split into training and validation datasets.
- Test data (test): these are 10% of the trees (balanced sample) and include individual tree point clouds (*.laz) but, for benchmarking purposes, the species labels are witheld for benchmarking purposes. Thus to make use of the test data the users should predict species on the test trees, and output a table (.csv file) with a row per predicted tree and two columns (treeID and predicted_species). This table can then be used to create a new submission in the FOR-species20K Codabench benchmarking platform and obtain the evaluation metrics corresponding to the test data.