Dataset Open Access

Crop classification dataset for testing domain adaptation or distributional shift methods

Kluger, Dan M.; Wang, Sherrie; Lobell, David B.

In this upload we share processed crop type datasets from both France and Kenya. These datasets can be helpful for testing and comparing various domain adaptation methods. The datasets are processed, used, and described in this paper: (arXiv version: 

In summary, each point in the uploaded datasets corresponds to a particular location. The label is the crop type grown at that location in 2017. The 70 processed features are based on Sentinel-2 satellite measurements at that location in 2017. The points in the France dataset come from 11 different departments (regions) in Occitanie, France, and the points in the Kenya dataset come from 3 different regions in Western Province, Kenya. Within each dataset there are notable shifts in the distribution of the labels and in the distribution of the features between regions. Therefore, these datasets can be helpful for testing for testing and comparing methods that are designed to address such distributional shifts.

More details on the dataset and processing steps can be found in Kluger et. al. (2021). Much of the processing steps were taken to deal with Sentinel-2 measurements that were corrupted by cloud cover. For users interested in the raw multi-spectral time series data and dealing with cloud cover issues on their own (rather than using the 70 processed features provided here), the raw dataset from Kenya can be found in Yeh et. al. (2021), and the raw dataset from France can be made available upon request from the authors of this Zenodo upload.

All of the data uploaded here can be found in "CropTypeDatasetProcessed.RData". We also post the dataframes and tables within that .RData file as separate .csv files for users who do not have R. The contents of each R object (or .csv file) is described in the file "Metadata.rtf".

Preferred Citation:

-Kluger, D.M., Wang, S., Lobell, D.B., 2021. Two shifts for crop mapping: Leveraging aggregate crop statistics to improve satellite-based maps in new regions. Remote Sens. Environ. 262, 112488.

-URL to this Zenodo post

Files (132.8 MB)
Name Size
42.9 MB Download
1.2 kB Download
40.3 MB Download
49.6 MB Download
5.9 kB Download
1.8 kB Download
  • Kluger D.M., Wang S., and Lobell, D.B. (2021). Two shifts for crop mapping: leveraging aggregate crop statistics to improve satellite-based maps in new regions. Remote Sensing of Environment. 262, 112488

  • C. Yeh, C. Meng, S. Wang, A. Driscoll, E. Rozi, P. Liu, J. Lee, M. Burke, D. Lobell, and S. Ermon, "SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning," in Thirty-fifth Conference on Neural Information Processing Systems, Datasets and Benchmarks Track (Round 2), Dec. 2021.

All versions This version
Views 207207
Downloads 191191
Data volume 7.4 GB7.4 GB
Unique views 181181
Unique downloads 156156


Cite as