Published March 12, 2026 | Version 0.1
Dataset Open

Outdoor Deployed SCD41 Co-Located Sensors Dataset

  • 1. ROR icon Estonian University of Life Sciences
  • 2. EDMO icon University of Tartu

Contributors

Project member:

  • 1. EDMO icon University of Tartu
  • 2. EDMO icon Estonian University of Life Sciences
  • 3. ROR icon Universidade Federal do Paraná

Description

Summary Description

Fourteen-month deployment of four Sensirion SCD41 low-cost CO₂ sensors at the SMEAR Estonia station, Järvselja, Estonia. Three sensors (A, B, C) operated at 30 m on the atmospheric sensing mast; the fourth (Sensor D; SCT1) at 2 m. One LGR reference instrument provided ground-truth measurements at 30 m. Sensors operated under adverse environmental conditions outside manufacturer specifications. The dataset includes 10-minute aggregated readings, per-sensor with merged outputs, quality-control flags, and validation receipts.

Intended as an adverse-conditions benchmark for evaluating robust calibration methods for low-cost environmental sensors.

Depicted preview Taylor diagram summarising SCD41 sensor performance relative to the LGR reference. Radial distance encodes standard deviation (ppm); angular position encodes Pearson correlation; dashed arcs show centred RMSD. Arrows indicate sensors whose standard deviation exceeds the plot radius.

Data Dictionary

Timestamps

All timestamps are in Coordinated Universal Time (UTC), formatted as ISO 8601 strings in CSV files (YYYY-MM-DD HH:MM:SS) and as datetime64[ns] in Parquet files. Each timestamp marks the start of a non-overlapping 10-minute aggregation window aligned to the Unix epoch.

Missing Data

Missing values appear as empty cells in CSV and as NaN in Parquet. No interpolation or gap-filling is applied. The merged file uses an outer join: if a sensor has no data at a given timestamp, all its columns are NaN for that row.

Per-Sensor Files: SCD41 (Sensors A, B, C, D)

File pattern: {SENSOR}_10min.csv, {SENSOR}_{YYYY}_{MM}.csv

Sensors: CO2_SCT1_2M (Sensor D, 2 m), CO2_30M_A, CO2_30M_B, CO2_30M_C (30 m)

Column Type Unit Description
timestamp datetime UTC Start of 10-minute window
co2_mean float64 ppm Mean CO₂ concentration
co2_std float64 ppm Standard deviation of CO₂ within window
co2_count int64 Number of deduplicated raw samples in window
temp_mean float64 °C Mean on-chip temperature (SHT4x sensor)
temp_std float64 °C Standard deviation of temperature
temp_count int64 Number of temperature samples
humidity_mean float64 % RH Mean relative humidity (SHT4x sensor)
humidity_std float64 % RH Standard deviation of humidity
humidity_count int64 Number of humidity samples
co2_qc string Quality-control flag for CO₂
temp_qc string Quality-control flag for temperature
humidity_qc string Quality-control flag for humidity

13 columns per file. At the SCD41 native sampling rate of 0.2 Hz, a full 10-minute window contains up to 120 deduplicated samples.

Per-Sensor Files: LGR Reference

File pattern: LGR_30M_10min.csv, LGR_30M_{YYYY}_{MM}.csv

Each measured variable produces four columns following the pattern {variable}_mean, {variable}_std, {variable}_count, and (where applicable) {variable}_sd for pooled standard deviation of the instrument-reported uncertainty.

Column pattern Type Unit Description
CO2_dry_* float64 ppm Dry-air CO₂ mole fraction (primary reference)
CO2_ppm_* float64 ppm Wet-air CO₂ concentration
CH4_dry_* float64 ppm Dry-air CH₄ mole fraction
CH4_ppm_* float64 ppm Wet-air CH₄ concentration
H2O_ppm_* float64 ppm Water vapour concentration
GasP_torr_* float64 torr Internal gas cell pressure
GasT_C_* float64 °C Internal gas cell temperature
AmbT_C_* float64 °C Enclosure ambient temperature
RD0_us_* float64 µs Ring-down time (channel 0)
RD1_us_* float64 µs Ring-down time (channel 1)
quality_* float64 0–100 Instrument quality indicator
Fit_Flag_* float64 Spectral fit quality (3 = good)
co2_qc string QC flag for CO₂ range
quality_qc string QC flag for instrument quality and fit

At the configured logging interval of approximately 120 s, each 10-minute window contains at most 5 LGR samples.

Merged File

File pattern: SMEAR_EE_CO2_merged.csv, SMEAR_EE_CO2_{YYYY}_{MM}_merged.csv

100 columns. All sensors are aligned to a common 10-minute timestamp index via outer join.

Column Naming Convention

Merged columns are prefixed with the lowercase sensor name:

{sensor}_{variable}_{statistic}

Examples:

  • co2_sct1_2m_co2_mean → Sensor D (2 m), CO₂, mean
  • co2_30m_a_temp_std → Sensor A (30 m), temperature, standard deviation
  • lgr_30m_CO2_dry_mean → LGR reference, dry CO₂, mean

SCD41 Sensor Columns (×4 sensors, 12 columns each)

Each SCD41 sensor contributes 12 columns: co2_mean, co2_std, co2_count, temp_mean, temp_std, temp_count, humidity_mean, humidity_std, humidity_count, co2_qc, temp_qc, humidity_qc — all prefixed with the sensor name.

LGR Columns (47 columns)

The LGR contributes 48 columns. Ten measurement variables (CO₂, CH₄, H₂O, pressure, temperatures, ring-down times) each produce four columns (mean, std, count, and pooled SD of the instrument-reported uncertainty); quality and Fit_Flag produce three columns each (mean, std, count — no instrument-reported SD exists for these). Two QC flag columns (co2_qc, quality_qc) complete the set. Pooled SD columns carry the suffix _sd__pooled_sd (double underscore).

Cluster Statistics (3 columns)

Computed from the three co-located 30 m SCD41 sensors (A, B, C):

Column Type Unit Description
scd41_30m_cluster_co2_mean float64 ppm Mean CO₂ across reporting 30 m sensors
scd41_30m_cluster_co2_std float64 ppm Standard deviation across reporting 30 m sensors
scd41_30m_cluster_co2_count int64 Number of 30 m sensors reporting (0–3)

Cluster statistics use only sensors with valid data at each timestamp. If one sensor reports, std is NaN.

Quality-Control Flags

Flag Applies to Meaning
OK All sensors Value within expected range
BELOW_RANGE All sensors Below minimum threshold (CO₂ < 300 ppm, T < −40 °C, RH < 0 %)
ABOVE_RANGE All sensors Above maximum threshold (CO₂ > 1000 ppm, T > 60 °C, RH > 100 %)
SPIKE SCD41 only Absolute change > 200 ppm between consecutive raw samples
NO_DATA All sensors No valid samples in aggregation window
LOW_QUALITY LGR only Quality indicator < 95
BAD_FIT LGR only Fit flag ≠ 3

We recommend filtering on qc_flag = 'OK' for initial analyses.

In this deployment, only OK, BELOW_RANGE, ABOVE_RANGE, and NO_DATA appear in the output. The remaining flags (SPIKE, LOW_QUALITY, BAD_FIT) are defined by the pipeline but were not triggered by any record in the current dataset.

File Formats

Format Extension Compression Notes
CSV .csv None Human-readable; timestamps as ISO 8601 strings
Parquet .parquet Snappy Columnar binary; preserves native types; recommended for large-scale analysis

CSV and Parquet files for the same time period contain identical data. Parquet files are typically 5–10× smaller.

File Naming

Pattern Example Contents
{SENSOR}_10min.csv CO2_30M_A_10min.csv Full concatenated time series
{SENSOR}_{YYYY}_{MM}.csv CO2_30M_A_2025_03.csv Single-month extract
{SENSOR}_{YYYY}_{MM}.parquet CO2_30M_A_2025_03.parquet Monthly Parquet
{SENSOR}_uptime.csv CO2_30M_A_uptime.csv Daily uptime statistics
SMEAR_EE_CO2_merged.csv All sensors, full period
SMEAR_EE_CO2_{YYYY}_{MM}_merged.* SMEAR_EE_CO2_2025_03_merged.parquet Monthly merged

Files

taylor_diagram.png

Files (1.5 GB)

Name Size Download all
md5:88be3ddb1a6a73c7db50418137f03849
1.9 MB Preview Download
md5:0139acf8b373f7f3d83ed29a39298575
169.4 kB Preview Download
md5:72795d4cb9cc90c218f2d5d46f23563f
254.5 MB Preview Download
md5:3264b2e019d4857a0b909c3dc6f92d46
125.7 kB Preview Download
md5:b5880ff543e48bb3797bb94f09da7678
1.1 GB Preview Download
md5:a3b039f3e8390aff231a4ca1608222cc
4.1 MB Preview Download
md5:f0b4e79da8ba0417b1326d40685ba784
1.4 MB Preview Download
md5:896d468bdf6c0bcb63ab2ec2c378d1b8
1.5 MB Preview Download
md5:fc177d402ca6ab68cf91b1427d5f1d32
57.9 MB Preview Download
md5:9e30fd67c10eae81d0fb7f4c3ca3391e
26.8 MB Download
md5:3755ff91e7f2fa43fec7473ecf8554af
20.7 kB Preview Download
md5:ebe5fae3f3aba7cfe7725c1e318a81bb
725.3 kB Preview Download
md5:5384f3634a8afaca5935107f26409105
2.7 MB Preview Download
md5:588e45836617a26c40d458ff646a42c9
3.9 MB Preview Download
md5:4b11b6dea71c21fff2df880efad9c312
2.7 MB Preview Download
md5:74067a7d4d519a36df40180fc3a2861f
183.5 kB Preview Download

Additional details

Funding

Estonian Research Council
Modelling of forest growth related carbon capture capability for application of climate smart forestry PRG1674
Estonian Research Council
Estonian Environmental Observatory (Eesti Keskkonnaobservatoorium) TARISTU24-TK11

Dates

Collected
2024-10
Experiment Deployed
Submitted
2026-03
Processed Data Uploaded to Zenodo

Software

Repository URL
https://github.com/SMEAR-EE/SCD41_DATA
Programming language
Python
Development Status
Active