ScientISST MOVE: Annotated Multimodal Naturalistic Dataset Recorded During Everyday Life Activities Using Wearable Devices

Areias Saraiva, João; Abreu, Mariana; Carmo, Ana Sofia; Plácido da Silva, Hugo; Fred, Ana

doi:10.5281/zenodo.8300972

Published May 15, 2023 | Version 1.1

Dataset Open

ScientISST MOVE: Annotated Multimodal Naturalistic Dataset Recorded During Everyday Life Activities Using Wearable Devices

1. Instituto Superior Técnico
2. Instituto de Telecomunicações

A multi-modality, multi-activity, and multi-subject dataset of wearable biosignals.

Modalities: ECG, EMG, EDA, PPG, ACC, TEMP

Main Activities: Lift object, Greet people, Gesticulate while talking, Jumping, Walking, and Running

Cohort: 17 subjects (10 male, 7 female); median age: 24

Devices: 2x ScientISST Core + 1x Empatica E4

Body Locations: Chest, Abdomen, Left bicep, wrist and index finger

No filter has been applied to the signals, but the correct transfer functions were applied, so the data is given in relevant unis (mV, uS, g, ºC).

========

There are three formats available:

a) LTBio's Biosignal files. Should be open like:

x = Biosignal.load(path)

LTBio Package: https://pypi.org/project/LongTermBiosignals/

Under the directory biosignal, the following tree structure is found: subject/x.biosignal, where subject is the subject’s code, and x is any of the following { acc_chest, acc_wrist, ecg, eda, emg, ppg, temp }. Each file includes the signals recorded from every sensor that acquires the modality after which the file is named, independently of the device.

Channels, activities and time intervals can be easily indexed with the index operator []: https://ltbio.readthedocs.io/en/latest/learn/basic/ltbio101.html

A sneak peak of the signals can also be quickly plotted with: x.preview.plot()

Any Biosignal can be easily converted to NumPy arrays or DataFrames, if needed.

b) CSV files. Can be open like:

x = pandas.read_csv(path)

Pandas Package: https://pypi.org/project/pandas/

These files can be found under the directory csv, named as subject.csv, where subject is the subject’s code. There is only one file per subject, containing their full session and all biosignal modalities. When read as tables, the time axis is in the first column, each sensor is in one of the middle columns, and the activity labels are in the last column. In each row are the samples of each sensor, if any, at each timestamp. At any given timestamp, if there is no sample for a sensor, it means the acquisition was interrupted for that sensor, which happens between activities, and sometimes for short periods during the running activity. Also in each row, on the last column, is one or more activity labels, if an activity was taking place at that timestamp. If there are multiple annotations, the labels are separated by commas (e.g 'run,sprint'). If there are no annotations, the column is empty for that timestamp.

In order to provide a tabular format with sensors with different sampling frequencies, the sensors with sampling frequency lower than 500 Hz were upsampled to 500 Hz. This way, the tables are regularly sampled, i.e., there is a row every 2 ms. If a sensor was not acquiring at a given timestamp, the corresponding cell with be empty. So, not only the segments with samples are regularly sampled, but the interruptions are also discretised. This means that if, after an interruption, a sensor starts acquiring at a non regular timestamp, the first sample will be written on the previous or the following timestamp, by half-up rounding. Naturally, this process cumulatively introduces lags in the table, some of which cancel out. Each individual lag is no longer than half the sampling period (1 ms), hence negligible. The cumulative lags are no longer than 200 ms for all subjects, which is also negligible. Nevertheless, only the LBio's Biosignal format preserves the exact original timestamps (10E-6 precision) of all samples and the original sampling frequencies.

================

Both include annotations of the activities, however LTBio bio signal files have better time resolution and include clinical data and demographic data as well.

c) EDF+ files. Can be open like:

x = mne.io.read_raw_edf(path)

MNE Package: https://mne.tools/stable/index.html

Under the directory edf, the following tree structure is found: subject/x.edf, where subject is the subject’s code, and x is any of the following { empathic, scientisst_chest, scientisst_forearm }. Each file includes the signals recorded from every device after which the file is named, independently of the modality.

Notes:

Original sampling frequencies are maintained.
Original units are maintained.
Signal is NaN during recording interruptions.
Events are in EDF annotations.
Biosignal and patient notes are not maintained.

The signals can be quickly plotted with: x.plot(). Make sure you have interactive Matplotlib activated. At first, you might have to decrease the scaling in order to correctly inspect them. tip: use the minus (-) kay in your keyboard as many times as necessary to reduce the scaling.

Files

biosignal.zip

Files (2.7 GB)

Name	Size	Download all
biosignal.zip md5:7a74b451bf19a44f9abf3448526179cb	263.1 MB	Preview Download
csv.zip md5:6e18d7913d5a06919f60035f4e9dfa96	2.2 GB	Preview Download
edf.zip md5:fe0412ff2f475e1ab7fffcbff71672f6	217.3 MB	Preview Download

	All versions	This version
Views	705	141
Downloads	331	130
Data volume	553.7 GB	137.9 GB

ScientISST MOVE: Annotated Multimodal Naturalistic Dataset Recorded During Everyday Life Activities Using Wearable Devices

Authors/Creators

Description

Files

biosignal.zip

Files (2.7 GB)