Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.

There is a newer version of the record available.

Published May 17, 2023 | Version 1.2
Dataset Open

Caravan - A global community dataset for large-sample hydrology

  • 1. Google Research, Vienna, Austria
  • 2. Google Research, Mountain View, CA, United States
  • 3. Geography, College of Life and Environmental Sciences, University of Exeter, Exeter, UK
  • 4. Google, Mountain View, CA, USA
  • 5. Institute for Machine Learning, Johannes Kepler University, Linz, Austria
  • 6. Google Research, Tel Aviv, Israel
  • 7. Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland


This is the accompanying dataset to the following paper

Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

Channel Log:

  • 23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.
  • 24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.
  • 15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).
  • 1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).
  • 16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check
  • 10 May 2023: Version 1.1 - No data change, just update data description.
  • 17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See for details.



Files (12.5 GB)

Name Size Download all
1.6 MB Preview Download
12.5 GB Preview Download
3.7 kB Preview Download

Additional details

Related works

Is described by
Journal article: 10.31223/X50S70 (DOI)