Published October 27, 2025 | Version Version 2
Dataset Open

Validated temperature and salinity data, and reconstructed nutrient concentrations in the North Pacific (1895–2024)

Description

The original hydrographic and nutrient data were compiled from the CLIVAR and Carbon Hydrographic Data Office (CCHDO; providing both hydrographic and nutrient measurements) and the World Ocean Database (WOD; supplying hydrographic data only) across the North Pacific. Rigorous quality control procedures—including range, spike, gradient, inversion, outlier checks and etc. — were applied to remove low-quality temperature, salinity, and nutrient records (NO₃⁻, NO₂⁻, DIP, and Si(OH)₄) from both databases. Three machine learning model (Random Forest (RF), Light Gradient Boosting Machine (LightGBM), and Gaussian Process Regression (GPR)) was trained on the quality-controlled CCHDO data to reconstruct nutrient concentrations, using spatial, temporal, and water mass predictors derived from the validated WOD hydrographic dataset. This process generated approximately 435 million reconstructed nutrient data points across 1.9 million stations for each nutrient species within the WOD, covering the period from 1895 to 2024 in the North Pacific (118.6 ºE to 280.3ºE; -2.0 to 60.6ºN). The final dataset offers validated temperature and salinity values along with reconstructed nutrient concentrations, providing a valuable resource for studying ocean biogeochemistry and climate-related changes in the North Pacific. In this updated version, we removed erroneous records in CCHDO to improve the foundation for model building.

This product provides reconstructed nutrient concentrations that are precisely linked to the location, depth, and time of each original hydrographic observations from the WOD. The underlying hydrographic data integrate observations from multiple platform types, including: Autonomous Pinniped Bathythermographs (APB), Conductivity Temperature Depth profilers (CTD), Drifting Buoys (DRB), Gliders (GLD), Mechanical Bathythermographs (MBT), Moored Buoys (MRB), Ocean Station Data (OSD), Profiling Floats (PFL), and Undulating Oceanographic Recorders (UOR). Nutrient concentrations are reported in umol kg⁻¹. The compressed files are named according to the reconstruction method used—RF, GPR, or LightGBM—and each contains data organized by WOD platform type.

 

Notes

Three compressed archives correspond to the nutrient concentrations reconstructed using the RF, LightGBM, and GPR methods, respectively. After decompression, each archive contains 20 additional compressed files. These filenames correspond to the hydrographic data and reconstructed nutrient concentrations from different platform types in the WOD.

Files

export_GPR.zip

Files (14.0 GB)

Name Size Download all
md5:be56422c05a330a7d02483012e925b0d
5.2 GB Preview Download
md5:68d44cc566abb379f3ff08f242221db8
3.7 GB Preview Download
md5:d3ef06874decedf3bc92d804ffea352b
5.1 GB Preview Download

Additional details

Funding

National Natural Science Foundation of China
42494885
National Natural Science Foundation of China
42494881
National Natural Science Foundation of China
42576215