Published April 16, 2024 | Version 1.4
Dataset Open

Caravan - A global community dataset for large-sample hydrology

  • 1. Google Research, Vienna, Austria
  • 2. Google Research, Mountain View, CA, United States
  • 3. Geography, College of Life and Environmental Sciences, University of Exeter, Exeter, UK
  • 4. Google, Mountain View, CA, USA
  • 5. Institute for Machine Learning, Johannes Kepler University, Linz, Austria
  • 6. Google Research, Tel Aviv, Israel
  • 7. Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland

Description

This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

Channel Log:

  • 23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.
  • 24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.
  • 15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).
  • 1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).
  • 16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan
  • 10 May 2023: Version 1.1 - No data change, just update data description.
  • 17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.
  • 16 April 2024: Version 1.3 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller  than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

Files

Caravan-code-snapshot-16Apr2024.zip

Files (40.6 GB)

Name Size Download all
md5:ebbc08bacc9163cd386bfc7b39a23a15
3.8 MB Preview Download
md5:abc66156d33f807bf48808fb64cfd14f
23.4 GB Download
md5:f5de14d4181a7dd2418eca6aa65badd6
17.1 GB Download

Additional details

Related works

Is described by
Journal article: 10.31223/X50S70 (DOI)