Outdoor Deployed SCD41 Co-Located Sensors Dataset
Authors/Creators
Contributors
Project member:
Researcher:
Supervisor (2):
Description
Summary Description
Fourteen-month deployment of four Sensirion SCD41 low-cost CO₂ sensors at the SMEAR Estonia station, Järvselja, Estonia. Three sensors (A, B, C) operated at 30 m on the atmospheric sensing mast; the fourth (Sensor D; SCT1) at 2 m. One LGR reference instrument provided ground-truth measurements at 30 m. Sensors operated under adverse environmental conditions outside manufacturer specifications. The dataset includes 10-minute aggregated readings, per-sensor with merged outputs, quality-control flags, and validation receipts.
Intended as an adverse-conditions benchmark for evaluating robust calibration methods for low-cost environmental sensors.
Depicted preview Taylor diagram summarising SCD41 sensor performance relative to the LGR reference. Radial distance encodes standard deviation (ppm); angular position encodes Pearson correlation; dashed arcs show centred RMSD. Arrows indicate sensors whose standard deviation exceeds the plot radius.
Data Dictionary
Timestamps
All timestamps are in Coordinated Universal Time (UTC), formatted as ISO 8601 strings in CSV files (YYYY-MM-DD HH:MM:SS) and as datetime64[ns] in Parquet files. Each timestamp marks the start of a non-overlapping 10-minute aggregation window aligned to the Unix epoch.
Missing Data
Missing values appear as empty cells in CSV and as NaN in Parquet. No interpolation or gap-filling is applied. The merged file uses an outer join: if a sensor has no data at a given timestamp, all its columns are NaN for that row.
Per-Sensor Files: SCD41 (Sensors A, B, C, D)
File pattern: {SENSOR}_10min.csv, {SENSOR}_{YYYY}_{MM}.csv
Sensors: CO2_SCT1_2M (Sensor D, 2 m), CO2_30M_A, CO2_30M_B, CO2_30M_C (30 m)
| Column | Type | Unit | Description |
|---|---|---|---|
timestamp |
datetime | UTC | Start of 10-minute window |
co2_mean |
float64 | ppm | Mean CO₂ concentration |
co2_std |
float64 | ppm | Standard deviation of CO₂ within window |
co2_count |
int64 | — | Number of deduplicated raw samples in window |
temp_mean |
float64 | °C | Mean on-chip temperature (SHT4x sensor) |
temp_std |
float64 | °C | Standard deviation of temperature |
temp_count |
int64 | — | Number of temperature samples |
humidity_mean |
float64 | % RH | Mean relative humidity (SHT4x sensor) |
humidity_std |
float64 | % RH | Standard deviation of humidity |
humidity_count |
int64 | — | Number of humidity samples |
co2_qc |
string | — | Quality-control flag for CO₂ |
temp_qc |
string | — | Quality-control flag for temperature |
humidity_qc |
string | — | Quality-control flag for humidity |
13 columns per file. At the SCD41 native sampling rate of 0.2 Hz, a full 10-minute window contains up to 120 deduplicated samples.
Per-Sensor Files: LGR Reference
File pattern: LGR_30M_10min.csv, LGR_30M_{YYYY}_{MM}.csv
Each measured variable produces four columns following the pattern {variable}_mean, {variable}_std, {variable}_count, and (where applicable) {variable}_sd for pooled standard deviation of the instrument-reported uncertainty.
| Column pattern | Type | Unit | Description |
|---|---|---|---|
CO2_dry_* |
float64 | ppm | Dry-air CO₂ mole fraction (primary reference) |
CO2_ppm_* |
float64 | ppm | Wet-air CO₂ concentration |
CH4_dry_* |
float64 | ppm | Dry-air CH₄ mole fraction |
CH4_ppm_* |
float64 | ppm | Wet-air CH₄ concentration |
H2O_ppm_* |
float64 | ppm | Water vapour concentration |
GasP_torr_* |
float64 | torr | Internal gas cell pressure |
GasT_C_* |
float64 | °C | Internal gas cell temperature |
AmbT_C_* |
float64 | °C | Enclosure ambient temperature |
RD0_us_* |
float64 | µs | Ring-down time (channel 0) |
RD1_us_* |
float64 | µs | Ring-down time (channel 1) |
quality_* |
float64 | 0–100 | Instrument quality indicator |
Fit_Flag_* |
float64 | — | Spectral fit quality (3 = good) |
co2_qc |
string | — | QC flag for CO₂ range |
quality_qc |
string | — | QC flag for instrument quality and fit |
At the configured logging interval of approximately 120 s, each 10-minute window contains at most 5 LGR samples.
Merged File
File pattern: SMEAR_EE_CO2_merged.csv, SMEAR_EE_CO2_{YYYY}_{MM}_merged.csv
100 columns. All sensors are aligned to a common 10-minute timestamp index via outer join.
Column Naming Convention
Merged columns are prefixed with the lowercase sensor name:
{sensor}_{variable}_{statistic}
Examples:
co2_sct1_2m_co2_mean→ Sensor D (2 m), CO₂, meanco2_30m_a_temp_std→ Sensor A (30 m), temperature, standard deviationlgr_30m_CO2_dry_mean→ LGR reference, dry CO₂, mean
SCD41 Sensor Columns (×4 sensors, 12 columns each)
Each SCD41 sensor contributes 12 columns: co2_mean, co2_std, co2_count, temp_mean, temp_std, temp_count, humidity_mean, humidity_std, humidity_count, co2_qc, temp_qc, humidity_qc — all prefixed with the sensor name.
LGR Columns (47 columns)
The LGR contributes 48 columns. Ten measurement variables (CO₂, CH₄, H₂O, pressure, temperatures, ring-down times) each produce four columns (mean, std, count, and pooled SD of the instrument-reported uncertainty); quality and Fit_Flag produce three columns each (mean, std, count — no instrument-reported SD exists for these). Two QC flag columns (co2_qc, quality_qc) complete the set. Pooled SD columns carry the suffix _sd__pooled_sd (double underscore).
Cluster Statistics (3 columns)
Computed from the three co-located 30 m SCD41 sensors (A, B, C):
| Column | Type | Unit | Description |
|---|---|---|---|
scd41_30m_cluster_co2_mean |
float64 | ppm | Mean CO₂ across reporting 30 m sensors |
scd41_30m_cluster_co2_std |
float64 | ppm | Standard deviation across reporting 30 m sensors |
scd41_30m_cluster_co2_count |
int64 | — | Number of 30 m sensors reporting (0–3) |
Cluster statistics use only sensors with valid data at each timestamp. If one sensor reports, std is NaN.
Quality-Control Flags
| Flag | Applies to | Meaning |
|---|---|---|
OK |
All sensors | Value within expected range |
BELOW_RANGE |
All sensors | Below minimum threshold (CO₂ < 300 ppm, T < −40 °C, RH < 0 %) |
ABOVE_RANGE |
All sensors | Above maximum threshold (CO₂ > 1000 ppm, T > 60 °C, RH > 100 %) |
SPIKE |
SCD41 only | Absolute change > 200 ppm between consecutive raw samples |
NO_DATA |
All sensors | No valid samples in aggregation window |
LOW_QUALITY |
LGR only | Quality indicator < 95 |
BAD_FIT |
LGR only | Fit flag ≠ 3 |
We recommend filtering on qc_flag = 'OK' for initial analyses.
In this deployment, only OK, BELOW_RANGE, ABOVE_RANGE, and NO_DATA appear in the output. The remaining flags (SPIKE, LOW_QUALITY, BAD_FIT) are defined by the pipeline but were not triggered by any record in the current dataset.
File Formats
| Format | Extension | Compression | Notes |
|---|---|---|---|
| CSV | .csv |
None | Human-readable; timestamps as ISO 8601 strings |
| Parquet | .parquet |
Snappy | Columnar binary; preserves native types; recommended for large-scale analysis |
CSV and Parquet files for the same time period contain identical data. Parquet files are typically 5–10× smaller.
File Naming
| Pattern | Example | Contents |
|---|---|---|
{SENSOR}_10min.csv |
CO2_30M_A_10min.csv |
Full concatenated time series |
{SENSOR}_{YYYY}_{MM}.csv |
CO2_30M_A_2025_03.csv |
Single-month extract |
{SENSOR}_{YYYY}_{MM}.parquet |
CO2_30M_A_2025_03.parquet |
Monthly Parquet |
{SENSOR}_uptime.csv |
CO2_30M_A_uptime.csv |
Daily uptime statistics |
SMEAR_EE_CO2_merged.csv |
— | All sensors, full period |
SMEAR_EE_CO2_{YYYY}_{MM}_merged.* |
SMEAR_EE_CO2_2025_03_merged.parquet |
Monthly merged |
Files
taylor_diagram.png
Files
(1.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:88be3ddb1a6a73c7db50418137f03849
|
1.9 MB | Preview Download |
|
md5:0139acf8b373f7f3d83ed29a39298575
|
169.4 kB | Preview Download |
|
md5:72795d4cb9cc90c218f2d5d46f23563f
|
254.5 MB | Preview Download |
|
md5:3264b2e019d4857a0b909c3dc6f92d46
|
125.7 kB | Preview Download |
|
md5:b5880ff543e48bb3797bb94f09da7678
|
1.1 GB | Preview Download |
|
md5:a3b039f3e8390aff231a4ca1608222cc
|
4.1 MB | Preview Download |
|
md5:f0b4e79da8ba0417b1326d40685ba784
|
1.4 MB | Preview Download |
|
md5:896d468bdf6c0bcb63ab2ec2c378d1b8
|
1.5 MB | Preview Download |
|
md5:fc177d402ca6ab68cf91b1427d5f1d32
|
57.9 MB | Preview Download |
|
md5:9e30fd67c10eae81d0fb7f4c3ca3391e
|
26.8 MB | Download |
|
md5:3755ff91e7f2fa43fec7473ecf8554af
|
20.7 kB | Preview Download |
|
md5:ebe5fae3f3aba7cfe7725c1e318a81bb
|
725.3 kB | Preview Download |
|
md5:5384f3634a8afaca5935107f26409105
|
2.7 MB | Preview Download |
|
md5:588e45836617a26c40d458ff646a42c9
|
3.9 MB | Preview Download |
|
md5:4b11b6dea71c21fff2df880efad9c312
|
2.7 MB | Preview Download |
|
md5:74067a7d4d519a36df40180fc3a2861f
|
183.5 kB | Preview Download |
Additional details
Funding
Dates
- Collected
-
2024-10Experiment Deployed
- Submitted
-
2026-03Processed Data Uploaded to Zenodo
Software
- Repository URL
- https://github.com/SMEAR-EE/SCD41_DATA
- Programming language
- Python
- Development Status
- Active