Published October 20, 2025 | Version 2.1.0
Dataset Open

CNR Ozone Sounding Merged (COSM) Dataset

  • 1. ROR icon National Research Council - Institute of Methodologies for Environmental Analysis
  • 2. ROR icon University of Salerno

Description

The unified database of ozonesounding profiles was obtained through the merging of three existing ozonesounding datasets, provided by the Southern Hemisphere Additional OZonesondes (SHADOZ), the Network for the Detection of Atmospheric Composition Change (NDACC), and the World Ozone and Ultraviolet Radiation Data Centre (WOUDC). 

Only a selected set of variables of interest, both data and metadata, were considered to build the unified dataset, due to the heterogeneous formats and varying levels of detail provided by each network, even when referring to measurements shared across different initiatives. These variables are listed in the following Table.

 

Standard name

Description

Unit

idstation

The name of the station.

N.A.

location_latitude

Latitude of station.

deg

location_longitude

Longitude of station.

deg

lacation_height

Height is defined as the altitude, elevation, or height of the defined platform + instrument above sea level.

m

date_of_observation

Date when the ozonesonde was launched (in format yyyy-mm-dd hh:mm:ss with time zone).

N.A.

time

Elapsed flight time since released.

s

pressure

Atmospheric pressure of each level in Pascals.

Pa

geop_alt

Geopotential height in meters.

m

temperature

Air temperature in degrees Kelvin.

K

relative_humidity

Relative humidity in 1.

1

wind_speed

Wind speed in meters per seconds.

m/s

wind_direction

Wind direction in degrees.

deg

latitude

Observation latitude (during the flight).

deg

longitude

Observation longitude (during the flight).

deg

altitude

Height of sensor above local ground or sea surface. Positive values for above surface (e.g., sondes), negative for below (e.g., xbt). For visual observations, the height of the visual observing platform.

m (a. s. l.)

sample_temperature

Temperature where sample is measured in degrees Kelvin.

K

o3_partial_pressure

The level partial pressure of ozone in Pascals.

Pa

ozone_concentration

The level mixing ratio of ozone in ppmv.

ppmv

ozone_partial_pressure_total_uncertainty

Total uncertainty in the calculation of the ozone partial pressure as a composite of the individual uncertainty contribution. Uncertainties due to systematic bias are assumed as random and follow a random normal distribution. The uncertainty calculation also accounts for the increased uncertainty incurred by homogenizing the data record.

Pa

network

Source network of the profile.

N.A.

type

Station classification flag.

N.A.

vertical_coverage_flag

Boolean flag indicating whether the ozone profile reaches the 10 hPa pressure level. Set to 't' if the profile exceeds 10 hPa, 'f' otherwise.

N.A.

vertical_completeness_flag

Boolean flag indicating whether the ozone profile contains at least one data point every 100 meters throughout its vertical extent. Set to 't' if the profile is vertically complete (i.e., no gaps larger than 100 meters), 'f' otherwise.

N.A.

outliers_flag

Boolean flag indicating whether the ozone partial pressure profile (o3_partial_pressure) contains strong outliers, based on the ±3·IQR method. Set to 't' if no strong outliers are found, 'f' otherwise.

N.A.

time_series_completeness_flag

Boolean flag indicating whether the time series for a given station includes at least three ozone profiles per month, allowing up to 5% of months without coverage. Set to 't' if this criterion is met, 'f' otherwise.

N.A.

filter_check

Profile quality control flag.

N.A.

 

The dataset is organized into two main tables:

  • unified_header, which contains metadata associated with each ozonesounding profile (idstation, date_of_observation, location_latitude, location_longitude, location_height, network, type, filter_check, vertical_coverage_flag, vertical_completeness_flag, outliers_flag, time_series_completeness_flag);
  • unified_value, which includes the actual measurement data (idstation, date_of_observation, time, pressure, geop_alt, temperature, relative_humidity, wind_speed, wind_direction, latitude, longitude, altitude, sample_temperature, o3_partial_pressure, ozone_concentration, ozone_partial_pressure_total_uncertainty).

To improve accessibility and performance, both tables are further subdivided into year-specific subtables, allowing for more efficient querying and data management across temporal ranges.

Among the metadata variables included in the unified_header table, type and filter_check play a key role in characterizing the quality and coverage of the ozonesounding profiles. The type variable classifies each station based on the continuity of its time series: stations are grouped into Long Coverage (G), Medium Coverage (Y), or Short Coverage (R), depending on whether they provide at least one profile per month for at least 95% of the months in their time series, spanning:

  • ≥20 years for Long Coverage,
  • ≥10 and <20 years for Medium Coverage,
  • <10 years for Short Coverage.

The filter_check variable is a quality control flag ranging from 0 to 4, summarizing the results of four structural checks applied to each profile: completeness of monthly coverage (at least three ascents per month), vertical coverage (reaching at least 10 hPa), vertical resolution (minimum one data point every 100 meters), and detection of strong outliers (values in ozone profiles beyond ±3·IQR). A higher filter_check value indicates better compliance with these criteria and, consequently, higher data reliability. The individual flags corresponding to each control are also provided in the dataset, allowing users to apply custom quality filters based on their specific research needs.

In addition to the dataset, two log files are provided to ensure full transparency of the quality control process and to allow users to trace all data removals and better understand the filtering criteria applied during dataset construction:

  • delete_outliers.log: lists all strong outlier values removed from the dataset. Each entry includes the station identifier, the profile date, the pressure level, and the corresponding outlier value of o3_partial_pressure.

  • delete_wrong_profile.log: contains all ozone profiles that were entirely removed due to being considered erroneous. These profiles typically exhibit values consistently close to zero or deviate significantly from the station’s seasonal climatology. Each entry is catalogued by station and launch date.

Furthermore, an algorithm was implemented able to merge the different datasets by handling their different features and duplicated profiles, i.e. profiles from different networks recorded within a 2-hour time window. In such cases, the profile that passes the greatest number of quality control (filter_check) tests is retained in the unified dataset. If multiple profiles meet the same number of quality control criteria, the selection is refined using additional indicators of dataset maturity, such as the availability of metadata, documentation, peer-reviewed publications, and especially the presence of measurement uncertainties associated with ozone concentration profiles. This last criterion is prioritized, as uncertainties are routinely provided in SHADOZ and, only for a limited number of profiles, in NDACC, while they are generally absent in WOUDC.

Files

ozone_unified.zip

Files (2.9 GB)

Name Size Download all
md5:372eddb82c8301aaef3233eeaf79131a
2.9 GB Preview Download

Additional details

Additional titles

Alternative title
Unified database of ozonesounding profiles from existing global archives

Dates

Updated
2025