Dataset Open Access
Schulz, Christian;
Ahlswede, Steve;
Gava, Christiano;
Helber, Patrick;
Bischke, Benjamin;
Arias, Florencia;
Förster, Michael;
Hees, Jörn;
Demir, Begüm;
Kleinschmit, Birgit
Context and Aim
Deep learning in Earth Observation requires large image archives with highly reliable labels for model training and testing. However, a preferable quality standard for forest applications in Europe has not yet been determined. The TreeSatAI consortium investigated numerous sources for annotated datasets as an alternative to manually labeled training datasets.
We found the federal forest inventory of Lower Saxony, Germany represents an unseen treasure of annotated samples for training data generation. The respective 20-cm Color-infrared (CIR) imagery, which is used for forestry management through visual interpretation, constitutes an excellent baseline for deep learning tasks such as image segmentation and classification.
Description
The data archive is highly suitable for benchmarking as it represents the real-world data situation of many German forest management services. One the one hand, it has a high number of samples which are supported by the high-resolution aerial imagery. On the other hand, this data archive presents challenges, including class label imbalances between the different forest stand types.
The TreeSatAI Benchmark Archive contains:
50,381 image triplets (aerial, Sentinel-1, Sentinel-2)
synchronized time steps and locations
all original spectral bands/polarizations from the sensors
20 species classes (single labels)
12 age classes (single labels)
15 genus classes (multi labels)
60 m and 200 m patches
fixed split for train (90%) and test (10%) data
additional single labels such as English species name, genus, forest stand type, foliage type, land cover
The geoTIFF and GeoJSON files are readable in any GIS software, such as QGIS. For further information, we refer to the PDF document in the archive and publications in the reference section.
Version history
v1.0.2 - Minor bug fix multi label JSON file
v1.0.1 - Minor bug fixes in multi label JSON file and description file
v1.0.0 - First release
Citation
Ahlswede, S., Schulz, C., Gava, C., Helber, P., Bischke, B., Förster, M., Arias, F., Hees, J., Demir, B., and Kleinschmit, B.: TreeSatAI Benchmark Archive: a multi-sensor, multi-label dataset for tree species classification in remote sensing, Earth Syst. Sci. Data, 15, 681–695, https://doi.org/10.5194/essd-15-681-2023, 2023.
GitHub
Full code examples and pre-trained models from the dataset article (Ahlswede et al. 2022) using the TreeSatAI Benchmark Archive are published on the GitLab and GitHub repositories of the Remote Sensing Image Analysis (RSiM) Group (https://git.tu-berlin.de/rsim/treesat_benchmark) and the Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) (https://github.com/DFKI/treesatai_benchmark). Code examples for the sampling strategy can be made available by Christian Schulz via email request.
Folder structure
We refer to the proposed folder structure in the PDF file.
Folder “aerial” contains the aerial imagery patches derived from summertime orthophotos of the years 2011 to 2020. Patches are available in 60 x 60 m (304 x 304 pixels). Band order is near-infrared, red, green, and blue. Spatial resolution is 20 cm.
Folder “s1” contains the Sentinel-1 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is VV, VH, and VV/VH ratio. Spatial resolution is 10 m.
Folder “s2” contains the Sentinel-2 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is B02, B03, B04, B08, B05, B06, B07, B8A, B11, B12, B01, and B09. Spatial resolution is 10 m.
The folder “labels” contains a JSON string which was used for multi-labeling of the training patches. Code example of an image sample with respective proportions of 94% for Abies and 6% for Larix is: "Abies_alba_3_834_WEFL_NLF.tif": [["Abies", 0.93771], ["Larix", 0.06229]]
The two files “test_filesnames.lst” and “train_filenames.lst” define the filenames used for train (90%) and test (10%) split. We refer to this fixed split for better reproducibility and comparability.
The folder “geojson” contains geoJSON files with all the samples chosen for the derivation of training patch generation (point, 60 m bounding box, 200 m bounding box).
CAUTION: As we could not upload the aerial patches as a single zip file on Zenodo, you need to download the 20 single species files (aerial_60m_…zip) separately. Then, unzip them into a folder named “aerial” with a subfolder named “60m”. This structure is recommended for better reproducibility and comparability to the experimental results of Ahlswede et al. (2022),
Join the archive
Model training, benchmarking, algorithm development… many applications are possible! Feel free to add samples from other regions in Europe or even worldwide. Additional remote sensing data from Lidar, UAVs or aerial imagery from different time steps are very welcome. This helps the research community in development of better deep learning and machine learning models for forest applications. You might have questions or want to share code/results/publications using that archive? Feel free to contact the authors.
Project description
This work was part of the project TreeSatAI (Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees at Infrastructures, Nature Conservation Sites and Forests). Its overall aim is the development of AI methods for the monitoring of forests and woody features on a local, regional and global scale. Based on freely available geodata from different sources (e.g., remote sensing, administration maps, and social media), prototypes will be developed for the deep learning-based extraction and classification of tree- and tree stand features. These prototypes deal with real cases from the monitoring of managed forests, nature conservation and infrastructures. The development of the resulting services by three enterprises (liveEO, Vision Impulse and LUP Potsdam) will be supported by three research institutes (German Research Center for Artificial Intelligence, TU Remote Sensing Image Analysis Group, TUB Geoinformation in Environmental Planning Lab).
Publications
Ahlswede, S., Schulz, C., Gava, C., Helber, P., Bischke, B., Förster, M., Arias, F., Hees, J., Demir, B., and Kleinschmit, B. (2022): TreeSatAI Benchmark Archive: A multi-sensor, multi-label dataset for tree species classification in remote sensing, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2022-312, 2022.
Ahlswede S., Nimisha, T.M., and Demir, B. (2022, in revision): Embedded Self-Enhancement Maps for Weakly Supervised Tree Species Mapping in Remote Sensing Images. IEEE Trans Geosci Remote Sens
Conference contributions
S. Ahlswede, N. T. Madam, C. Schulz, B. Kleinschmit and B. Demіr, "Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation Methods", IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.
C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, “Exploring the temporal fingerprints of mid-European forest types from Sentinel-1 RVI and Sentinel-2 NDVI time series”, IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.
C. Schulz, M. Förster, S. Vulova and B. Kleinschmit, “The temporal fingerprints of common European forest types from SAR and optical remote sensing data”, AGU Fall Meeting, New Orleans, USA, 2021.
B. Kleinschmit, M. Förster, C. Schulz, F. Arias, B. Demir, S. Ahlswede, A. K. Aksoy, T. Ha Minh, J. Hees, C. Gava, P. Helber, B. Bischke, P. Habelitz, A. Frick, R. Klinke, S. Gey, D. Seidel, S. Przywarra, R. Zondag and B. Odermatt, “Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees and Forests”, Living Planet Symposium, Bonn, Germany, 2022.
C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, (2022): “Exploring the temporal fingerprints of sixteen mid-European forest types from Sentinel-1 and Sentinel-2 time series”, ForestSAT, Berlin, Germany, 2022.
Name | Size | |
---|---|---|
220629_doc_TreeSatAI_benchmark_archive.pdf
md5:4d6b87bde2e20bef81f325ca62ccbf22 |
2.1 MB | Download |
aerial_60m_abies_alba.zip
md5:4298b1c9fbf6d0d85f7aa208ff5fe0c9 |
310.3 MB | Download |
aerial_60m_acer_pseudoplatanus.zip
md5:7c31d7ddea841f6509deece8f984a79e |
857.7 MB | Download |
aerial_60m_alnus_spec.zip
md5:34ea107f43c6172c6d2652dbf26306af |
791.4 MB | Download |
aerial_60m_betula_spec.zip
md5:69de9373739a027692a823846434fa0c |
886.4 MB | Download |
aerial_60m_cleared.zip
md5:8dffbb2f6aad17ef83721cffa5b52d96 |
1.2 GB | Download |
aerial_60m_fagus_sylvatica.zip
md5:77b277e69e90bfbd3c5fd15a73d228fe |
2.0 GB | Download |
aerial_60m_fraxinus_excelsior.zip
md5:9a88a8e6821f8a54ded950de9238831f |
815.0 MB | Download |
aerial_60m_larix_decidua.zip
md5:aa0bc5b091b099018a078536ef429031 |
417.7 MB | Download |
aerial_60m_larix_kaempferi.zip
md5:429df073f69f8bbf60aef765e1c925ba |
550.5 MB | Download |
aerial_60m_picea_abies.zip
md5:edb9b1bc9a5a7b405f4cbb0d71cedf54 |
1.8 GB | Download |
aerial_60m_pinus_nigra.zip
md5:96bf1798ef82f712ea46c2963ddb7083 |
124.5 MB | Download |
aerial_60m_pinus_strobus.zip
md5:0ff818c6d31f59b8488880e49b300c7a |
156.3 MB | Download |
aerial_60m_pinus_sylvestris.zip
md5:298cbaac4d9f07a204e1e74e8446798d |
2.0 GB | Download |
aerial_60m_populus_spec.zip
md5:46fcff76b119cc24f3caf938a0bb433a |
144.4 MB | Download |
aerial_60m_prunus_spec.zip
md5:fb1c570d3ea925a049630224ccb354bc |
91.5 MB | Download |
aerial_60m_pseudotsuga_menziesii.zip
md5:2d05511ceabf4037b869eca928f3c04e |
838.7 MB | Download |
aerial_60m_quercus_petraea.zip
md5:31f573fb0419b2b453ed7da1c4d2a298 |
808.1 MB | Download |
aerial_60m_quercus_robur.zip
md5:bcd90506509de26692c043f4c8d73af0 |
1.1 GB | Download |
aerial_60m_quercus_rubra.zip
md5:71d8495725ed1b4f27d9e382409fcc5e |
576.3 MB | Download |
aerial_60m_tilia_spec.zip
md5:f81558c9c7189ac8a257d041ee43c1c9 |
64.1 MB | Download |
geojson.zip
md5:aa749718f3cb76c1dfc9cddc2ed201db |
8.1 MB | Download |
labels.zip
md5:656f1b68ec9ab70afd02bb127b75bb24 |
581.1 kB | Download |
s1.zip
md5:bed4fc8cb65da46a24ec1bc6cea2763c |
320.2 MB | Download |
s2.zip
md5:453ba69056aa33a3c6b97afb7b6afadb |
510.0 MB | Download |
test_filenames.lst
md5:2166903d947f0025f61e342da466f917 |
184.9 kB | Download |
train_filenames.lst
md5:a1a0148e8120b0268f76d2e98a68436f |
1.7 MB | Download |
All versions | This version | |
---|---|---|
Views | 2,315 | 835 |
Downloads | 4,819 | 1,701 |
Data volume | 1.6 TB | 602.5 GB |
Unique views | 1,868 | 732 |
Unique downloads | 2,233 | 731 |