FOR-species20K dataset

Puliti, Stefano; Lines, Emily; Müllerová, Jana; Frey, Julian; Schindler, Zoe; Straker, Adrian; Allen, Matthew J.; Lukas, Winiwarter; Rehush, Nataliia; Hristova, Hristina; Murray, Brent; Calders, Kim; Terryn, Louise; Coops, Nicholas; Höfle, Bernhard; Junttila, Samuli; Krucek, Martin; Krok, Grzegorz; Král, Kamil; Levick, Shaun R.; Luck, Linda; Missarov, Azim; Mokroš, Martin; Owen, Harry; Stereńczak, Krzysztof; Pitkänen, Timo; Puletti, Nicola; Saarinen, Ninni; Hopkinson, Chris; Torresan, Chiara; Tomelleri, Enrico; Weiser, Hannah; Astrup, Rasmus

doi:10.5281/zenodo.13255198

Published August 7, 2024 | Version v1

Dataset Open

FOR-species20K dataset

1. Norwegian Institute of Bioeconomy Research
2. University of Cambridge
3. Jan Evangelista Purkyně University in Ústí nad Labem
4. Albert-Ludwigs-Universität Freiburg
5. University of Freiburg
6. University of Göttingen
7. Universität Innsbruck
8. Swiss Federal Institute for Forest, Snow and Landscape Research
9. University of British Columbia
10. Ghent University
11. Heidelberg University
12. University of Eastern Finland
13. Silva Tarouca Research Institute for Landscape and Ornamental Gardening
14. Forest Research Institute
15. CSIRO Land and Water
16. Charles Darwin University
17. University College London
18. Natural Resources Institute Finland
19. Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria
20. University of Lethbridge
21. National Research Council
22. Free University of Bozen-Bolzano

Description

Data for benchmarking tree species classification from proximally-sensed laser scanning data.

Data split and usage

The data is split into:

Development data (dev): these includes 90% of the trees in the dataset and consists of individual tree point clouds (*.laz) named according to the treeID column available in the tree_metadata_dev.csv file, from which tree_species labels are available. These data are meant to be used for model development and can thus be further split into training and validation datasets.
Test data (test): these are 10% of the trees (balanced sample) and include individual tree point clouds (*.laz) but, for benchmarking purposes, the species labels are witheld for benchmarking purposes. Thus to make use of the test data the users should predict species on the test trees, and output a table (.csv file) with a row per predicted tree and two columns (treeID and predicted_species). This table can then be used to create a new submission in the FOR-species20K Codabench benchmarking platform and obtain the evaluation metrics corresponding to the test data.

Cite

Any scientific publication using the data should cite the following paper:

Puliti, S., Lines, E., Müllerová, J., Frey, J., Schindler, Z., Straker, A., Allen, M.J., Winiwarter, L., Rehush, N., Hristova, H., Murray, B., Calders, K., Terryn, L., Coops, N., Höfle, B., Krůček, M., Krokm, G., Král, K., Luck, L., Levick, S.R., Missarov, A., Mokroš, M., Owen, H., Stereńczak, K., Pitkänen, T.P., Puletti, N., Saarinen, N., Hopkinson, C., Torresan, C., Tomelleri, E., Weiser, H., Junttila, S., and Astrup, R. (2025) Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset. Methods in Ecology and Evolution, 00,1–18. Available here

Files

dev.zip

Files (27.2 GB)

Name	Size	Download all
dev.zip md5:ecb6196d9d630b095e6f6249b46efdd7	25.4 GB	Preview Download
test.zip md5:087eaed01d45bf83643b4d7edcef33f8	1.8 GB	Preview Download
tree_metadata_dev.csv md5:603c91ff58c8eab486717ffc82a1b21f	1.2 MB	Preview Download

Additional details

Available: 2024

Repository URL: https://github.com/stefp/FOR-species
Programming language: Python
Development Status: Active

Views

Downloads

Show more details

	All versions	This version
Views	4,820	4,820
Downloads	2,969	2,969
Data volume	75.9 TB	75.9 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Published in

Puliti, S., Lines, E., Müllerová, J., Frey, J., Schindler, Z., Straker, A., Allen, M.J., Winiwarter, L., Rehush, N., Hristova, H., Murray, B., Calders, K., Terryn, L., Coops, N., Höfle, B., Krůček, M., Krokm, G., Král, K., Luck, L., Levick, S.R., Missarov, A., Mokroš, M., Owen, H., Stereńczak, K., Pitkänen, T.P., Puletti, N., Saarinen, N., Hopkinson, C., Torresan, C., Tomelleri, E., Weiser, H., Junttila, S., and Astrup, R. (2025) Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset. Methods in Ecology and Evolution, 00, 1–18. https://doi.org/10.1111/2041-210X.14503help, 00, ISSN: 2041-210X, 2024.

License: GNU General Public License v3.0 or later

Permissions of this strong copyleft license are conditioned on making available complete source code of licensed works and modifications, which include larger works using a licensed work, under the same license. Copyright and license notices must be preserved. Contributors provide an express grant of patent rights. Read more; GNU Affero General Public License v3.0 or later

No further description. Read more

Technical metadata

Created: August 12, 2024
Modified: February 3, 2025

Description

Data split and usage

Cite

dev.zip

Files (27.2 GB)

Dates

Software

FOR-species20K dataset

Authors/Creators

Description

Description

Data split and usage

Cite

Files

dev.zip

Files (27.2 GB)

Additional details

Dates

Software