Dataset Open Access

Crop classification dataset for testing domain adaptation or distributional shift methods

Kluger, Dan M.; Wang, Sherrie; Lobell, David B.

In this upload we share processed crop type datasets from both France and Kenya. These datasets can be helpful for testing and comparing various domain adaptation methods. The datasets are processed, used, and described in this paper: https://doi.org/10.1016/j.rse.2021.112488 (arXiv version: https://arxiv.org/pdf/2109.01246.pdf). 

In summary, each point in the uploaded datasets corresponds to a particular location. The label is the crop type grown at that location in 2017. The 70 processed features are based on Sentinel-2 satellite measurements at that location in 2017. The points in the France dataset come from 11 different departments (regions) in Occitanie, France, and the points in the Kenya dataset come from 3 different regions in Western Province, Kenya. Within each dataset there are notable shifts in the distribution of the labels and in the distribution of the features between regions. Therefore, these datasets can be helpful for testing for testing and comparing methods that are designed to address such distributional shifts.

More details on the dataset and processing steps can be found in Kluger et. al. (2021). Much of the processing steps were taken to deal with Sentinel-2 measurements that were corrupted by cloud cover. For users interested in the raw multi-spectral time series data and dealing with cloud cover issues on their own (rather than using the 70 processed features provided here), the raw dataset from Kenya can be found in Yeh et. al. (2021), and the raw dataset from France can be made available upon request from the authors of this Zenodo upload.

All of the data uploaded here can be found in "CropTypeDatasetProcessed.RData". We also post the dataframes and tables within that .RData file as separate .csv files for users who do not have R. The contents of each R object (or .csv file) is described in the file "Metadata.rtf".

Preferred Citation:

-Kluger, D.M., Wang, S., Lobell, D.B., 2021. Two shifts for crop mapping: Leveraging aggregate crop statistics to improve satellite-based maps in new regions. Remote Sens. Environ. 262, 112488. https://doi.org/10.1016/j.rse.2021.112488.

-URL to this Zenodo post https://zenodo.org/record/6376160

Files (132.8 MB)
Name Size
CropTypeDatasetProcessed.RData
md5:131c94af45b6d7d703c7115f764ae271
42.9 MB Download
feature_names.csv
md5:c4c034b18eb79a1f5891f44077350d9f
1.2 kB Download
France_crop_type_data.csv
md5:82f1bc4ef1c28681df2775ef95b42dda
40.3 MB Download
Kenya_crop_type_data.csv
md5:b110182a6954e31d721baad82898c340
49.6 MB Download
Metadata.rtf
md5:0408fb40e1d3e4af99a2d1ad4fdd47b8
5.9 kB Download
OccitanieFrance_cropType_dist_fromGovStatistics.csv
md5:f3917eaac4c565c9dfce48a90b8c5805
1.8 kB Download
  • Kluger D.M., Wang S., and Lobell, D.B. (2021). Two shifts for crop mapping: leveraging aggregate crop statistics to improve satellite-based maps in new regions. Remote Sensing of Environment. 262, 112488

  • C. Yeh, C. Meng, S. Wang, A. Driscoll, E. Rozi, P. Liu, J. Lee, M. Burke, D. Lobell, and S. Ermon, "SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning," in Thirty-fifth Conference on Neural Information Processing Systems, Datasets and Benchmarks Track (Round 2), Dec. 2021.

207
191
views
downloads
All versions This version
Views 207207
Downloads 191191
Data volume 7.4 GB7.4 GB
Unique views 181181
Unique downloads 156156

Share

Cite as