Published August 14, 2025 | Version v7
Dataset Open

CAMELSH: A Large-Sample Hourly Hydrometeorological Dataset and Attributes at Watershed-Scale for Contiguous United States

  • 1. ROR icon University of Michigan–Ann Arbor

Description

CAMELSH is a large-sample hydrometeorological dataset at the hourly scale for the contiguous United States. CAMELSH intergrates hourly meteorological time series, catchment attributes and boundaries from GAGES-II and HydroATLAS for 9,008 catchments across diverse climatic, hydrological, and anthropogenic conditions. In addition, hourly streamflow time series is provided for 5,767 (the latest version) catchments. The dataset spans 45 years (1980–2024) with 11 meteorological variables from the NLDAS-2 (and ERA5-Land) forcing datasets, from which we compute nine climate indices related to precipitation, evapotranspiration, seasonality, and snow fraction. Additionally, CAMELSH includes two sets of catchment attributes: 439 from GAGES-II and 195 derived from HydroATLAS. These attributes include factors related to climate, geology, hydrology, river/stream morphology, landscape, nutrient, soil, topography, and anthropogenic influences.

Additionally, the dataset provides simulated (retrospective) hourly streamflow data from the National Water Model versions 3.0 and 2.1 for 8,664 locations that correspond to USGS stations. It also includes archived real-time forecast water stages from the California-Nevada River Forecast Center (CNRFC) for 68 locations across California and Nevada, covering the period from 2011 to 2023. These simulated datasets serve as valuable resources for model development and comparison studies, functioning as benchmark models.

 

Reference: Vinh Ngoc Tran, Donghui Xu, Tam Van Nguyen, Taeho Kim, Valeriy Ivanov, CAMELSH: A Large-Sample Hourly Hydrometeorological Dataset and Attributes at Watershed-Scale for CONUS, Scientific Data (2025), https://doi.org/10.1038/s41597-025-05612-6

 

****** UPDATE 09/08/2025 (BENCHMARK MODELS)

We have added benchmarking simulation data for model comparison studies.

This includes retrospective simulations from the National Water Model version 2.1 (NWM21) and version 3.0 (NWM30) at 8,664 locations (overlapped with USGS stations) across the CONUS. The data is provided at an hourly temporal resolution for the period 1980–2020 (NWM21) and 1980-2023 (NWM30).

In addition, we have included actual operational forecast data (water stage) from the California-Nevada River Forecast Center (CNRFC) for 68 locations in California and Nevada. These forecasts have a lead time of up to 120 hours and were collected once per day. The data covers the period from 2011 to 2023.

Data can be downloaded here: https://doi.org/10.5281/zenodo.16763144

 

****** UPDATE 08/02/2025 (LATEST VERSION OF CAMELSH)

The total number of gauges is 5,767.

Gauges with observed streamflow records longer than 365 (*24 hours) days were selected (only count streamflows > 0 cms). 

The hourly streamflow and water level data for a total of 5,767 USGS gauges are stored in individual NetCDF files and packaged together in the Hourly2.zip archive. The dataset covers the period from 1980-01-01 00:00:00 to 2024-12-31 23:00:00. Missing values are indicated by NaN.

  • Fixed errors at several stations due to inconsistent data periods.
  • Corrected the "time" variable across all NetCDF files.
  • Performed data quality checks and removed significant negative values.
  • Optimized the size of NetCDF files.

Observed streamflow and water level can be downloaded here: https://doi.org/10.5281/zenodo.16729675

NLDAS-2 forcings can be downloaded here:  https://doi.org/10.5281/zenodo.15066778 (folder 1); https://doi.org/10.5281/zenodo.14889025 (folder 2)

ERA5-Land forcings can be downloaded here: https://doi.org/10.5281/zenodo.15264813

Link Google Drive (backup only): https://drive.google.com/drive/folders/15dk6qlU38LqsUkf9hiIZHXGtzzrDfl-q?usp=sharing

 

****** UPDATE 05/15/2025 (Restricted)

We have increased the number of basins with observational data from 3,166 to 5,188.

In addition, we have added water level data alongside streamflow measurements.

The hourly streamflow and water level data for a total of 5,188 USGS gauges are stored in individual NetCDF files and packaged together in the Hourly.7z archive. The dataset covers the period from 1980-01-01 00:00:00 to 2024-12-31 23:00:00. Missing values are indicated by NaN.

 

****** UPDATE 05/01/2025

We updated ERA5-Land forcings. The variables are: dewpoint temperature, 2-meter air temperature, soil moisture (four layers), net solar and thermal radiation, 10-meter wind components, surface pressure, total precipitation, snow water equivalent, and potential evaporation

ERA5-Land forcings can be downloaded here: https://doi.org/10.5281/zenodo.15264813

 

****** ORIGINAL VERSION

The current version of the CAMELSH dataset, containing data for 9,008 basins,. Due to the total data volume in the repository being approximately 57 GB, which exceeds Zenodo's size limit, we split it into two different links. The first link (https://doi.org/10.5281/zenodo.15066778) contains data on attributes, shapefiles, and time series data for the first set of basins. The second link (https://doi.org/10.5281/zenodo.14889025) contains forcing (time series) data for the the remaining basins. All data is compressed in 7zip format. After extraction, the dataset is organized into the following subfolders: 


•    The attributes folder contains 28 CSV (comma-separated values) files that store basin attributes with all files beginning with "attributes_" and one excel file. Of these, the 'attributes_nldas2_climate.csv' file contains nine climate attributes (Table 2) derived from NLDAS-2 data. The 'attributes_hydroATLAS.csv' file includes 195 basin attributes derived from the HydroATLAS dataset. 26 files with names starting with 'attributes_gageii_' contain a total of 439 basin attributes extracted from the GAGES-II dataset. The name of each file represents a distinct group of attributes, as described in Table S.1. The remaining file, named 'Var_description_gageii.xlsx', provides explanatory details regarding the variable names included in the 26 CSV files, with information similar to that presented in Table S.1. The first column in all CSV files, labeled 'STAID', contains the identification (ID) names of the stream gauges. These IDs are assigned by the USGS and are sourced from the original GAGES-II dataset.
•    The shapefiles folder contains two sets of shapefiles for the catchment boundary. The first set, CAMELSH_shapefile.shp, is derived from the original GAGES-II dataset and is used to obtain the corresponding climate forcing data for each catchment. The second set, CAMELSH_shapefile_hydroATLAS.shp, includes catchment boundaries derived from the HydroATLAS dataset. Each polygon in both shapefiles contains a field named GAGE_ID, which represents the ID of the stream gauges.
•    The timeseries (7zip) file contains a compressed archive (7zip) that includes time series data for 3,166 basins with observed streamflow data. Within this 7zip file, there are a total of 3,166 NetCDF files, each corresponding to a specific basin. The name of each NetCDF file matches the stream gauge ID. Each file contains an hourly time series from 1980-01-01 00:00:00 to 2024-12-31 23:00:00 for streamflow (denoted as "Streamflow" in the NetCDF file) and 11 climate variables (see Table 1). The streamflow data series includes missing values, which are represented as "NaN". All meteorological forcing data and streamflow records have been standardized to the +0 UTC time zone.
•    The timeseries_nonobs (7zip) file contains time series data for the remaining 5,842 basins. The structure of each NetCDF file is similar to the one described above.
•    The info.csv file, located in the main directory of the dataset, contains basic information for 9,008 stream stations. This includes the stream gauge ID, the total number of observed hourly data points over 45 years (from 1980 to 2024), and the number of observed hourly data points for each year from 1980 to 2024. Stations with and without observed data are distinguished by the value in the second column, where stations without observed streamflow data have a corresponding value of 0.

Download link: https://doi.org/10.5281/zenodo.15066778 (folder 1)

https://doi.org/10.5281/zenodo.14889025 (folder 2)

 

Note: All timestamps in the data have been converted to UTC+0.

Files

Files (3.5 GB)

Name Size Download all
md5:0d7c11fbd3a29f5ab604ebb2a9319f17
67.4 MB Download
md5:b0e0f57048471fb6c910f193333404ca
3.4 GB Download

Additional details

Dates

Available
2025-03-23