Descriptive analysis

Author

Johannes Zauner

Preface

This is a work-in-progress descriptive analysis of the GuidolinEtAl2025 dataset.

Overview

Data import: wearable data

The first step is the import of wearable data from the head position (mounted on glasses).

#time zone of madrid
tz <-  "Europe/Berlin"
#coordinates for Tübingen
coordinates <- c(48.5216, 9.0576)
#regex to extract participant Id and wearing position
# pattern <- "[A-Z]+_S[0-9]{3}_[hcw]"
#regex to extract participant Id
pattern <- "^([0-9]{3})_"

country_colors <- c(
  Sweden     = "#88CCEE",  # Sky blue
  Spain      = "#CC6677",  # Coral red
  Germany    = "#DDCC77",  # Mustard yellow
  Netherlands= "#117733",  # Dark green
  Turkey     = "#332288",  # Indigo
  Ghana      = "#AA4499",  # Purple-pink
  Costa_Rica = "#44AA99"   # Teal
)
#path to participants
path_part1 <- "../data/raw/group/actlumus"
#path to actlumus data sans wearing position
# path_part2 <- "/continuous/actlumus_"
#wearing position
# wearing_position <- "head"
#getting all subfolders
# folders <- dir(path_part1)
#creating complete folder names
# paths <- glue("{path_part1}/{folders}{path_part2}{wearing_position}")
#collecting file names
files <- list.files(path_part1, full.names = TRUE)
files <- files[str_detect(files, "Report", negate = TRUE)]
#there remain some early data from a pilot collection. these will be removed
data <- import$ActLumus(files, tz, auto.id = pattern, dst_adjustment = TRUE)

Successfully read in 1'579'496 observations across 26 Ids from 26 ActLumus-file(s).
Timezone set is Europe/Berlin.
Observations in the following 2 file(s) and 2 Id(s) cross to or from daylight savings time (DST): 
File: 221_actlumus_Log_1607_20231030121531432, Group:221
File: 222_actlumus_Log_1020_20231030140039534, Group:222
The Datetime column was adjusted in these files. For more info on what that entails see `?dst_change_handler`.

First Observation: 2023-08-14 10:55:21
Last Observation: 2023-11-13 11:24:55
Timespan: 91 days

Observation intervals: 
   Id    interval.time             n pct  
 1 201   10s                   60042 100% 
 2 202   10s                   59957 100% 
 3 204   10s                   61980 100% 
 4 205   10s                   61015 100% 
 5 206   10s                   60691 100% 
 6 206   23s                       1 0%   
 7 206   59575s (~16.55 hours)     1 0%   
 8 208   10s                   59853 100% 
 9 209   10s                   60084 100% 
10 210   10s                   60701 100% 
# ℹ 33 more rows

Regularizing data

In the first step, we will trim the data by the study time.

#import table with study times
Study_dates <- read_excel("../data/Study_dates_MeLiDos_MPI.xlsx")
#gather the important information
Study_dates <-
  Study_dates |>
    rename(Id = subjectID_device, start = datetime_trial_start, end = datetime_trial_end) |>
    select(Id, start, end) |>
    mutate(across(c(start, end), \(x) force_tz(x, tz)),
           trial = TRUE) |>
    drop_na() |> 
    # filter(str_detect(Id, "_h$")) |>
    mutate(
      # Id = str_remove(Id, "_h$"),
      Id = factor(Id)) |>
  group_by(Id)

#add the trim information to the dataset and filter by it
data <-
  data |>
  add_states(Study_dates) |>
  dplyr::filter(trial) |>
  select(-trial)

data |> gg_overview()

# data |> summarize(min = min(Datetime), max = max(Datetime))
data |> has_gaps()
[1] FALSE
data |> has_irregulars()
[1] FALSE
data |> gg_gaps(group.by.days = TRUE, show.irregulars = TRUE, full.days = FALSE)
No gaps nor irregular values were found. Plot creation skipped
data_cleaned <- data |> gap_handler(full.days = TRUE)
data_cleaned |> gap_table(MEDI) |> cols_hide(ends_with("_n"))
Summary of available and missing data
Variable: melanopic EDI
Data
Missing
Regular
Irregular
Range
Interval
Gaps
Implicit
Explicit
Time % n1,2 Time Time N ø Time % Time % Time %
Overall 25w 5d 11h 14m 50s 86.7%3 0 29w 5d 2h 46m 40s 10 52 1w 6d 19h 45m 55s 3w 6d 15h 31m 50s 13.3%3 0s 0.0%3 3w 6d 15h 31m 50s 13.3%3
201
6d 21h 42m 30s 86.3% 0 1w 1d 10s 2 13h 8m 45s 1d 2h 17m 30s 13.7% 0s 0.0% 1d 2h 17m 30s 13.7%
202
6d 21h 30m 10s 86.2% 0 1w 1d 10s 2 13h 14m 55s 1d 2h 29m 50s 13.8% 0s 0.0% 1d 2h 29m 50s 13.8%
204
1w 87.5% 0 1w 1d 10s 2 12h 1d 12.5% 0s 0.0% 1d 12.5%
205
6d 22h 17m 10s 86.6% 0 1w 1d 10s 2 12h 51m 25s 1d 1h 42m 50s 13.4% 0s 0.0% 1d 1h 42m 50s 13.4%
206
6d 21h 38m 50s 86.3% 0 1w 1d 10s 2 13h 10m 35s 1d 2h 21m 10s 13.7% 0s 0.0% 1d 2h 21m 10s 13.7%
208
6d 21h 17m 20s 86.1% 0 1w 1d 10s 2 13h 21m 20s 1d 2h 42m 40s 13.9% 0s 0.0% 1d 2h 42m 40s 13.9%
209
6d 21h 57m 30s 86.4% 0 1w 1d 10s 2 13h 1m 15s 1d 2h 2m 30s 13.6% 0s 0.0% 1d 2h 2m 30s 13.6%
210
6d 23h 44m 87.4% 0 1w 1d 10s 2 12h 8m 1d 16m 12.6% 0s 0.0% 1d 16m 12.6%
212
6d 20h 26m 40s 85.6% 0 1w 1d 10s 2 13h 46m 40s 1d 3h 33m 20s 14.4% 0s 0.0% 1d 3h 33m 20s 14.4%
213
6d 21h 8m 30s 86.0% 0 1w 1d 10s 2 13h 25m 45s 1d 2h 51m 30s 14.0% 0s 0.0% 1d 2h 51m 30s 14.0%
214
1w 87.5% 0 1w 1d 10s 2 12h 1d 12.5% 0s 0.0% 1d 12.5%
215
6d 22h 26m 10s 86.7% 0 1w 1d 10s 2 12h 46m 55s 1d 1h 33m 50s 13.3% 0s 0.0% 1d 1h 33m 50s 13.3%
216
1w 87.5% 0 1w 1d 10s 2 12h 1d 12.5% 0s 0.0% 1d 12.5%
218
6d 21h 58m 50s 86.4% 0 1w 1d 10s 2 13h 35s 1d 2h 1m 10s 13.6% 0s 0.0% 1d 2h 1m 10s 13.6%
219
6d 23h 29m 20s 87.2% 0 1w 1d 10s 2 12h 15m 20s 1d 30m 40s 12.8% 0s 0.0% 1d 30m 40s 12.8%
221
1w 1h 87.6% 0 1w 1d 1h 10s 2 12h 1d 12.4% 0s 0.0% 1d 12.4%
222
1w 1h 87.6% 0 1w 1d 1h 10s 2 12h 1d 12.4% 0s 0.0% 1d 12.4%
223
6d 21h 44m 10s 86.3% 0 1w 1d 10s 2 13h 7m 55s 1d 2h 15m 50s 13.7% 0s 0.0% 1d 2h 15m 50s 13.7%
224
6d 22h 54m 40s 86.9% 0 1w 1d 10s 2 12h 32m 40s 1d 1h 5m 20s 13.1% 0s 0.0% 1d 1h 5m 20s 13.1%
225
1w 46m 40s 87.6% 0 1w 1d 46m 40s 10s 2 12h 1d 12.4% 0s 0.0% 1d 12.4%
226
1w 87.5% 0 1w 1d 10s 2 12h 1d 12.5% 0s 0.0% 1d 12.5%
227
6d 20h 32m 20s 85.7% 0 1w 1d 10s 2 13h 43m 50s 1d 3h 27m 40s 14.3% 0s 0.0% 1d 3h 27m 40s 14.3%
228
6d 22h 3m 30s 86.5% 0 1w 1d 10s 2 12h 58m 15s 1d 1h 56m 30s 13.5% 0s 0.0% 1d 1h 56m 30s 13.5%
229
6d 22h 7m 30s 86.5% 0 1w 1d 10s 2 12h 56m 15s 1d 1h 52m 30s 13.5% 0s 0.0% 1d 1h 52m 30s 13.5%
230
6d 21h 56m 20s 86.4% 0 1w 1d 10s 2 13h 1m 50s 1d 2h 3m 40s 13.6% 0s 0.0% 1d 2h 3m 40s 13.6%
231
6d 21h 32m 40s 86.2% 0 1w 1d 10s 2 13h 13m 40s 1d 2h 27m 20s 13.8% 0s 0.0% 1d 2h 27m 20s 13.8%
1 If n > 0: it is possible that the other summary statistics are affected, as they are calculated based on the most prominent interval.
2 Number of (missing or actual) observations
3 Based on times, not necessarily number of observations

Visualization

data_cleaned |> 
grand_overview(coordinates, "Tübingen", "Germany", country_colors, photoperiod_sequence = 1)

ggsave("../output/figures/Figure_1.png", width = 17, height = 10, scale = 2, units = "cm")
ggsave("../output/figures/Figure_1.pdf", width = 17, height = 10, scale = 2, units = "cm")
table_summary <-
light_summary_table(
  data_cleaned, coordinates, "Tübingen", "Germany", country_colors["Germany"],
  histograms = TRUE
)

table_summary
Summary table
Tübingen, Germany (48.5°N, 9.1°E), TZ: Europe/Berlin
Overview
Participants
26
Participant-days
208
Days ≥80% complete
156
Missing/irregular
13.28% (12.44% - 14.35%)
Photoperiod
12h 32m (10h 27m - 15h 35m) 1 
Metrics2
Dose D (lx·h) 7,475 ±9,743 (173 - 68,218)
Duration above 250 lx TAT250 2h 44m ±2h 32m (0s - 10h 38m)
Duration within 1-10 lx TWT1–10 3h 45m ±2h 32m (33m - 16h)
Duration below 1 lx TBT1 9h 17m ±2h 24m (12m 50s - 17h 46m)
Period above 250 lx PAT250 35m 19s ±40m 46s (0s - 4h 14m)
Duration above 1000 lx TAT1000 1h 3m ±1h 18m (0s - 6h 52m)
First timing above 250 lx FLiT250 09:09 ±02:33 (00:07 - 15:01) 1 
Mean timing above 250 lx MLiT250 13:26 ±01:41 (08:03 - 17:48) 1 
Last timing above 250 lx LLiT250 18:30 ±02:43 (11:09 - 23:59) 1 
Brightest 10h midpoint M10midpoint 13:49 ±01:34 (09:32 - 18:59) 1 
Darkest 5h midpoint L5midpoint 03:21 ±01:42 (02:00 - 20:54) 1 
Brightest 10h mean3 M10mean (lx) 149.7 ±208.2 (2.1 - 1,161.4)
Darkest 5h mean3 L5mean (lx) 0.0 ±0.2 (0.0 - 2.1)
Interdaily stability IS 0.329 ±0.101 (0.170 - 0.508)
Intradaily variability IV 1.158 ±0.377 (0.518 - 1.871)
values show: mean ±sd (min - max) and are all based on measurements of melanopic EDI (lx)
1 Histogram limits are set from 00:00 to 24:00
2 Metrics are calculated on a by-participant-day basis (n=156) with the exception of IV and IS, which are calculated on a by-participant basis (n=26).
3 Values were log 10 transformed prior to averaging, with an offset of 0.1, and backtransformed afterwards
gtsave(table_summary, here("output/tables/table_summary.png"), vwidth = 820)
gtsave(table_summary, here("output/tables/table_summary.pdf"))
gtsave(table_summary |> cols_hide(c(plot)), here("output/tables/table_summary.docx"))