There is a newer version of the record available.

Published February 8, 2024 | Version v3
Dataset Open

NEON Tree Species Predictions

Contributors

Description

# Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network

Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1

## Manuscript Abstract

The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change.

## Data Summary

Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file.

## Prediction metadata

*Geometry*

A four pointed bounding box location in utm coordinates.

*indiv_id*

A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile. 

*sci_name*

The full latin name of predicted species aligned with NEON's taxonomic nomenclature. 

*ens_score*

The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model. 

*bleaf_taxa*

Highest predicted category for the broadleaf submodel

*bleaf_score*

The confidence score for the broadleaf taxa submodel 

*oak_taxa*

Highest predicted category for the oak model 

*dead_label*

A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead.

*dead_score*

The confidence score of the Alive/Dead prediction. 

*site_id*

The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations.

*conif_taxa*

Highest predicted category for the conifer model

*conif_score*

The confidence score for the conifer taxa submodel

*dom_taxa*

Highest predicted category for the dominant taxa mode submodel

*dom_score*

The confidence score for the dominant taxa submodel

## Training data

The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is 
<individualID>_<year>_<sensor>, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022.
Along with the crops are .csv files for various train-test split experiments for the manuscript.

### Crop metadata

There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species.

*score*

This was the DeepForest crown score for the crop.

*taxonID*
For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists

*individual*
unique individual identifier for a given field record and crown crop

*siteID*
The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations.

*plotID*

NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design

*CHM_height*

The LiDAR derived height for the field sampling point.

*image_path*

Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight

*tile_year* 

Flight year of the sensor data

*RGB_image_path*

Relative pathname for the RGB array, can be read by rasterio.open()

# Code repository

The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttention
Key files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59).

Files

BART.zip

Files (17.6 GB)

Name Size Download all
md5:0a0ea0359e660564eb4f2d7f37559599
735.7 MB Preview Download
md5:78e3c494e9458d37650623ddb355b272
393.9 MB Preview Download
md5:f4874ddfd76411679db5cd06360cc12a
1.3 GB Preview Download
md5:4f52299478760ed912e1e635a08ac123
289.6 MB Preview Download
md5:4a3037baae38f2b050af74605039589f
2.0 GB Preview Download
md5:a37a0e7e23c988b2c202b5655901539f
382.9 MB Preview Download
md5:83ea22c2bef5d085ce34f0a40636fe79
595.9 MB Preview Download
md5:ba0156215e1f0f4414bb8567d83e37f5
339.6 MB Preview Download
md5:35afb9b2931d32ed16b51798853b616b
1.7 GB Preview Download
md5:0ef4f91425e2fd595e91ac0ed4e1de30
493.8 MB Preview Download
md5:5cdcd617d612ae7a1a8acc835e3dc030
609.0 MB Preview Download
md5:cd4ba6eec4759f59dd029bca0da732dd
627.6 MB Preview Download
md5:d382c15655a499f96b12e18c494b07a7
718.9 MB Preview Download
md5:70e7eee908f4c95baee3d541a6541252
1.3 GB Preview Download
md5:52d0ade7d38ffd4a00a26b5354d180ad
367.1 MB Preview Download
md5:015656a18fd40788f1de30a12a1bc0ad
126.4 MB Preview Download
md5:bf2a864fc888a1f65b77e8d24e9a0048
429.8 MB Preview Download
md5:6bb2b1474fbf748f0afd06ebc4b1a522
570.6 MB Preview Download
md5:e958fd85206be0a07dbbb959f3b66e35
518.6 MB Preview Download
md5:f3bd8fcce690d4e7c78a0fccce48bc51
1.3 GB Preview Download
md5:ef3c1946504138082ddc4e65b37437ff
237.8 MB Preview Download
md5:5d95cd9854b28fc60c66fc5f46ffc53c
886.2 MB Preview Download
md5:b71bffbcc75e636d9ce5e7a391c1bdc5
1.0 GB Preview Download
md5:b0e93cb5fcde9d3a4f9f222ee2bb3873
531.2 MB Preview Download

Additional details

Funding

U.S. National Science Foundation
MRA: Disentangling cross-scale influences on tree species, traits, and diversity from individual trees to continental scales 1926542