Published April 6, 2024 | Version v4
Dataset Open

NEON Tree Species Predictions

Description

# Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network

Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1

## Manuscript Abstract

The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change.

## Data Summary

Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file.

## Prediction metadata

*Geometry*

A four pointed bounding box location in utm coordinates.

*indiv_id*

A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile. 

*sci_name*

The full latin name of predicted species aligned with NEON's taxonomic nomenclature. 

*ens_score*

The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model. 

*bleaf_taxa*

Highest predicted category for the broadleaf submodel

*bleaf_score*

The confidence score for the broadleaf taxa submodel 

*oak_taxa*

Highest predicted category for the oak model 

*dead_label*

A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead.

*dead_score*

The confidence score of the Alive/Dead prediction. 

*site_id*

The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations.

*conif_taxa*

Highest predicted category for the conifer model

*conif_score*

The confidence score for the conifer taxa submodel

*dom_taxa*

Highest predicted category for the dominant taxa mode submodel

*dom_score*

The confidence score for the dominant taxa submodel

## Training data

The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is 
<individualID>_<year>_<sensor>, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022.
Along with the crops are .csv files for various train-test split experiments for the manuscript.

### Crop metadata

There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species.

*score*

This was the DeepForest crown score for the crop.

*taxonID*
For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists

*individual*
unique individual identifier for a given field record and crown crop

*siteID*
The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations.

*plotID*

NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design

*CHM_height*

The LiDAR derived height for the field sampling point.

*image_path*

Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight

*tile_year* 

Flight year of the sensor data

*RGB_image_path*

Relative pathname for the RGB array, can be read by rasterio.open()

# Code repository

The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttention
Key files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59).

Files

BART.zip

Files (10.5 GB)

Name Size Download all
md5:4912c3164bcb13ccaeacc6b7aca44bd7
400.6 MB Preview Download
md5:146de541343ce79ec23a6a80f526fdb7
213.7 MB Preview Download
md5:74c834cf38322c3d57c07cdd423e1e86
721.8 MB Preview Download
md5:d49102527d44c590f0cd06855f73c575
157.0 MB Preview Download
md5:ea8b7b4174227a8d2f5a26d04003d461
2.1 GB Preview Download
md5:fe33dadf70e7ca432c947742104b6e40
208.3 MB Preview Download
md5:e30b64ed4e22e1cad72106a83d18a570
323.7 MB Preview Download
md5:0ee933fff5327f4e230ae6a3d3894b9f
185.1 MB Preview Download
md5:8bd3078e4374549f34256f39abd65c64
917.6 MB Preview Download
md5:5c1d9dc4046dcc32c70ea5c12fa242ce
268.8 MB Preview Download
md5:b68291326663d5a5c208e1ca2133b5b6
330.9 MB Preview Download
md5:645d9318926db7ea5fa91de083df66cb
342.8 MB Preview Download
md5:c4cc43e8db06014478a8bba2824a5a28
388.9 MB Preview Download
md5:7815642d9c540539218bfb059eda5fe4
689.3 MB Preview Download
md5:3d01df35c06ddace8a1f56f869b5327d
199.7 MB Preview Download
md5:6d7da3864ddec71ec5b2db13d3227e84
68.9 MB Preview Download
md5:1d13b049cb6eb95ba952aa6c5cc40581
233.3 MB Preview Download
md5:6cee3e9c0f63349722317d47ec8b4d2a
310.7 MB Preview Download
md5:61a1e51554c031fa64cb2616b5fe06a4
281.9 MB Preview Download
md5:261ccd2238944a7fb43e17d6f6474174
728.6 MB Preview Download
md5:55e0484211522ce150cc3b3056d1a39b
129.3 MB Preview Download
md5:fbb65190adf1fbe294196303fa251884
479.9 MB Preview Download
md5:69c804dbee11a88a5d09c2b36a944acf
554.6 MB Preview Download
md5:55e4bef0c071f8251260ca9feaca27b6
289.4 MB Preview Download

Additional details

Funding

MRA: Disentangling cross-scale influences on tree species, traits, and diversity from individual trees to continental scales 1926542
National Science Foundation