Published April 6, 2024 | Version v4
Dataset Open

NEON Tree Species Predictions


# Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network


## Manuscript Abstract

The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change.

## Data Summary

Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file.

## Prediction metadata


A four pointed bounding box location in utm coordinates.


A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile. 


The full latin name of predicted species aligned with NEON's taxonomic nomenclature. 


The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model. 


Highest predicted category for the broadleaf submodel


The confidence score for the broadleaf taxa submodel 


Highest predicted category for the oak model 


A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead.


The confidence score of the Alive/Dead prediction. 


The four letter code for the NEON site. See for site locations.


Highest predicted category for the conifer model


The confidence score for the conifer taxa submodel


Highest predicted category for the dominant taxa mode submodel


The confidence score for the dominant taxa submodel

## Training data

The contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is 
<individualID>_<year>_<sensor>, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022.
Along with the crops are .csv files for various train-test split experiments for the manuscript.

### Crop metadata

There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species.


This was the DeepForest crown score for the crop.

For letter species code, see NEON plant taxonomy for scientific name:

unique individual identifier for a given field record and crown crop

The four letter code for the NEON site. See for site locations.


NEON plot ID within the site. For more information on NEON sampling see:


The LiDAR derived height for the field sampling point.


Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight


Flight year of the sensor data


Relative pathname for the RGB array, can be read by

# Code repository

The predictions were made using the DeepTreeAttention repo:
Key files include model definition for a [single year model]( and [Data preprocessing](


Files (10.5 GB)

Name Size Download all
400.6 MB Preview Download
213.7 MB Preview Download
721.8 MB Preview Download
157.0 MB Preview Download
2.1 GB Preview Download
208.3 MB Preview Download
323.7 MB Preview Download
185.1 MB Preview Download
917.6 MB Preview Download
268.8 MB Preview Download
330.9 MB Preview Download
342.8 MB Preview Download
388.9 MB Preview Download
689.3 MB Preview Download
199.7 MB Preview Download
68.9 MB Preview Download
233.3 MB Preview Download
310.7 MB Preview Download
281.9 MB Preview Download
728.6 MB Preview Download
129.3 MB Preview Download
479.9 MB Preview Download
554.6 MB Preview Download
289.4 MB Preview Download

Additional details


MRA: Disentangling cross-scale influences on tree species, traits, and diversity from individual trees to continental scales 1926542
National Science Foundation