CROPSIGHT-US: An Object-Based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States
Description
CropSight-US is a novel national-scale, object-based crop type ground truth dataset for the contiguous United States (CONUS) from 2013 to 2023, featuring records across 17 major crop types and 294 Agricultural Statistics Districts with uncertainty information of 124,419 cropland fields. The most current version is `cropsight-us_app_dat_v1.0.0.zip`.
Our crop type ground truth dataset includes the following 17 crop types: alfalfa, almond, canola, cereal, corn, cotton, grape, orange, peanut, pistachio, potato, sorghum, soybean, sugarbeet, sugarcane, sunflower, walnut.
Each crop type ground truth data is represented by a polygon and contains the following information:
- Label: the predicted crop type label
- Lat: latitude of the original street view image location
- Lon: longitude of the original street view image location
- Heading: direction of view of the original street view image towards the target field
- Month: month when the original street view image was captured
- Year: year when the original street view image was captured
Each entry in the generated CropSight-US ground truth dataset includes the predicted crop type, the associated confidence metrics (entropy, variance, and confidence level), the delineated cropland field boundary, and the year and month when the original GSV image is captured.
- Entropy: measures the randomness in class probabilities across Monte Carlo (MC) simulations. Higher entropy indicates greater uncertainty in crop type prediction.
- Entropy_P: the percentile rank of a polygon’s entropy relative to the entire dataset.
- Variance: quantifies the variability of predicted class probabilities across MC simulations. Higher variance signals inconsistent predictions across runs.
- Variance_P: the percentile rank of a polygon’s variance relative to the dataset.
- Confidence: a binary label (1/0) indicating prediction certainty, derived using entropy and variance thresholds established from the test subset of our reference dataset. A value of 1 represents high confidence in the prediction results. A value of 0 indicates uncertainty beyond our defined threshold: these predictions are not necessarily low-quality, but fall outside the range deemed confidently predictable based on our criteria.
Code repository and more examples are available at: https://github.com/rssiuiuc/CropSight ; We also created an interactive application hosted on Google Earth Engine (GEE): https://ee-azzhou249.projects.earthengine.app/view/cropsight-us.
For technical details about the methods used to create this dataset, please refer to Liu et al. (2024) and the data description paper available soon.
This research was supported in part by the Illinois Computes project which is supported by the University of Illinois Urbana-Champaign and the University of Illinois System.
Files
cropsight-us_app_dat_v1.0.0.zip
Files
(300.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e4bcf5d7474ca068bdfec19b889345b3
|
300.8 MB | Preview Download |
|
md5:7eadeed395046a079e9fdbfaf565c7cd
|
2.9 kB | Preview Download |
Additional details
Related works
- Cites
- Journal article: 10.1016/j.isprsjprs.2024.07.025 (DOI)
- Is described by
- Preprint: 10.5194/essd-2025-527 (DOI)
Funding
- U.S. National Science Foundation
- CAREER: Scalable Remote Sensing Computational Framework for Near-real-time Crop Characterization 2048068
- National Aeronautics and Space Administration
- Scalable Crop Phenological Retrieval through Integration of Remote Sensing and Crop Modeling 80NSSC21K0946
- United States Department of Agriculture
- III: Small: DATAg: Scalable Real-time Satellite-based Crop Yield Forecasting Framework via Deep Learning 2021-67021-33446
Software
- Repository URL
- https://github.com/rssiuiuc/CropSight
- Programming language
- Python