Published June 20, 2025 | Version 1.0.0
Dataset Open

CROPSIGHT-US: An Object-Based Crop Type Ground Truth Dataset Using Street View and Sentinel-2 Satellite Imagery across the Contiguous United States

  • 1. ROR icon University of Illinois Urbana-Champaign

Description

CropSight-US is a novel national-scale, object-based crop type ground truth dataset for the contiguous United States (CONUS) from 2013 to 2023, featuring records across 17 major crop types and 294 Agricultural Statistics Districts with uncertainty information of 124,419 cropland fields. The most current version is `cropsight-us_app_dat_v1.0.0.zip`.

Our crop type ground truth dataset includes the following 17 crop types: alfalfa, almond, canola, cereal, corn, cotton, grape, orange, peanut, pistachio, potato, sorghum, soybean, sugarbeet, sugarcane, sunflower, walnut.

Each crop type ground truth data is represented by a polygon and contains the following information:

  • Label: the predicted crop type label
  • Lat: latitude of the original street view image location
  • Lon: longitude of the original street view image location
  • Heading: direction of view of the original street view image towards the target field
  • Month: month when the original street view image was captured
  • Year: year when the original street view image was captured

Each entry in the generated CropSight-US ground truth dataset includes the predicted crop type, the associated confidence metrics (entropy, variance, and confidence level), the delineated cropland field boundary, and the year and month when the original GSV image is captured.

  • Entropy: measures the randomness in class probabilities across Monte Carlo (MC) simulations. Higher entropy indicates greater uncertainty in crop type prediction.
  • Entropy_P: the percentile rank of a polygon’s entropy relative to the entire dataset.
  • Variance: quantifies the variability of predicted class probabilities across MC simulations. Higher variance signals inconsistent predictions across runs.
  • Variance_P: the percentile rank of a polygon’s variance relative to the dataset.
  • Confidence: a binary label (1/0) indicating prediction certainty, derived using entropy and variance thresholds established from the test subset of our reference dataset. A value of 1 represents high confidence in the prediction results. A value of 0 indicates uncertainty beyond our defined threshold: these predictions are not necessarily low-quality, but fall outside the range deemed confidently predictable based on our criteria.

Code repository and more examples are available at: https://github.com/rssiuiuc/CropSight ; We also created an interactive application hosted on Google Earth Engine (GEE): https://ee-azzhou249.projects.earthengine.app/view/cropsight-us.

For technical details about the methods used to create this dataset, please refer to Liu et al. (2024) and the data description paper available soon.

This research was supported in part by the Illinois Computes project which is supported by the University of Illinois Urbana-Champaign and the University of Illinois System.

Files

cropsight-us_app_dat_v1.0.0.zip

Files (300.8 MB)

Name Size Download all
md5:e4bcf5d7474ca068bdfec19b889345b3
300.8 MB Preview Download
md5:7eadeed395046a079e9fdbfaf565c7cd
2.9 kB Preview Download

Additional details

Related works

Cites
Journal article: 10.1016/j.isprsjprs.2024.07.025 (DOI)
Is described by
Preprint: 10.5194/essd-2025-527 (DOI)

Funding

U.S. National Science Foundation
CAREER: Scalable Remote Sensing Computational Framework for Near-real-time Crop Characterization 2048068
National Aeronautics and Space Administration
Scalable Crop Phenological Retrieval through Integration of Remote Sensing and Crop Modeling 80NSSC21K0946
United States Department of Agriculture
III: Small: DATAg: Scalable Real-time Satellite-based Crop Yield Forecasting Framework via Deep Learning 2021-67021-33446

Software

Repository URL
https://github.com/rssiuiuc/CropSight
Programming language
Python