Caravan - A global community dataset for large-sample hydrology
Creators
- 1. Google Research, Vienna, Austria
- 2. Google Research, Mountain View, CA, United States
- 3. Geography, College of Life and Environmental Sciences, University of Exeter, Exeter, UK
- 4. Google, Mountain View, CA, USA
- 5. Institute for Machine Learning, Johannes Kepler University, Linz, Austria
- 6. Google Research, Tel Aviv, Israel
- 7. Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland
Description
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w
Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.
If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.
All current development and additional community extensions can be found at https://github.com/kratzert/Caravan
IMPORTANT: Due to size limitations for individual repositories, the netCDF version and the CSV version of Caravan (since Version 1.6)  are split into two different repositories. You can find the CSV version at https://zenodo.org/records/15530021
Channel Log:
- 23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.
- 24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.
- 15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).
- 1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).
- 16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan
- 10 May 2023: Version 1.1 - No data change, just update data description.
- 17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.
- 16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code
- 16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").
- 27 May 2025: Version 1.6
 - Updated the CAMELS-AUS data to source from CAMELS-AUS v2. This means more basins (561 compared to 222) and more recent streamflow data (2022 compared to 2014). Note that the gauge id for four basins changed between the original CAMELS-AUS version and v2. Those gauges are ['camelsaus_224213A', 'camelsaus_224214A', 'camelsaus_227225A', 'camelsaus_403213A'] that all lost their trailing "A". To stay synced with CAMELS-AUS (v2), we also adapted the new naming.
- Added VERSION file to the root directory that contains the current version number.
- Updated the code to the most recent GitHub snapshot (commit 6eab036).
- Due to the 50GB repository limit, we had to split the netCDF version and the CSV version into two separate repositories. The CSV version can be found under https://zenodo.org/records/15530021
 
Files
      
        Caravan-code.zip
        
      
    
    
      
        Files
         (24.8 GB)
        
      
    
    | Name | Size | Download all | 
|---|---|---|
| md5:33df29a80c56d74e94da44f9feeaa357 | 4.2 MB | Preview Download | 
| md5:ad3a7767db059f218380e9cce5f72452 | 24.8 GB | Download | 
Additional details
Related works
- Is described by
- Journal article: 10.31223/X50S70 (DOI)