ScientISST MOVE: Annotated Multimodal Naturalistic Dataset Recorded During Everyday Life Activities Using Wearable Devices
Authors/Creators
- 1. Instituto Superior Técnico
- 2. Instituto de Telecomunicações
Description
A multi-modality, multi-activity, and multi-subject dataset of wearable biosignals.
Modalities: ECG, EMG, EDA, PPG, ACC, TEMP
Main Activities: Lift object, Greet people, Gesticulate while talking, Jumping, Walking, and Running
Cohort: 17 subjects (10 male, 7 female); median age: 24
Devices: 2x ScientISST Core + 1x Empatica E4
Body Locations: Chest, Abdomen, Left bicep, wrist and index finger
No filter has been applied to the signals, but the correct transfer functions were applied, so the data is given in relevant unis (mV, uS, g, ºC).
========
There are three formats available:
a) LTBio's Biosignal files. Should be open like:
x = Biosignal.load(path)
LTBio Package: https://pypi.org/project/LongTermBiosignals/
Under the directory biosignal, the following tree structure is found: subject/x.biosignal, where subject is the subject’s code, and x is any of the following { acc_chest, acc_wrist, ecg, eda, emg, ppg, temp }. Each file includes the signals recorded from every sensor that acquires the modality after which the file is named, independently of the device.
Channels, activities and time intervals can be easily indexed with the index operator []: https://ltbio.readthedocs.io/en/latest/learn/basic/ltbio101.html
A sneak peak of the signals can also be quickly plotted with: x.preview.plot()
Any Biosignal can be easily converted to NumPy arrays or DataFrames, if needed.
b) CSV files. Can be open like:
x = pandas.read_csv(path)
Pandas Package: https://pypi.org/project/pandas/
These files can be found under the directory csv, named as subject.csv, where subject is the subject’s code. There is only one file per subject, containing their full session and all biosignal modalities. When read as tables, the time axis is in the first column, each sensor is in one of the middle columns, and the activity labels are in the last column. In each row are the samples of each sensor, if any, at each timestamp. At any given timestamp, if there is no sample for a sensor, it means the acquisition was interrupted for that sensor, which happens between activities, and sometimes for short periods during the running activity. Also in each row, on the last column, is one or more activity labels, if an activity was taking place at that timestamp. If there are multiple annotations, the labels are separated by commas (e.g 'run,sprint'). If there are no annotations, the column is empty for that timestamp.
In order to provide a tabular format with sensors with different sampling frequencies, the sensors with sampling frequency lower than 500 Hz were upsampled to 500 Hz. This way, the tables are regularly sampled, i.e., there is a row every 2 ms. If a sensor was not acquiring at a given timestamp, the corresponding cell with be empty. So, not only the segments with samples are regularly sampled, but the interruptions are also discretised. This means that if, after an interruption, a sensor starts acquiring at a non regular timestamp, the first sample will be written on the previous or the following timestamp, by half-up rounding. Naturally, this process cumulatively introduces lags in the table, some of which cancel out. Each individual lag is no longer than half the sampling period (1 ms), hence negligible. The cumulative lags are no longer than 200 ms for all subjects, which is also negligible. Nevertheless, only the LBio's Biosignal format preserves the exact original timestamps (10E-6 precision) of all samples and the original sampling frequencies.
================
Both include annotations of the activities, however LTBio bio signal files have better time resolution and include clinical data and demographic data as well.
c) EDF+ files. Can be open like:
x = mne.io.read_raw_edf(path)
MNE Package: https://mne.tools/stable/index.html
Under the directory edf, the following tree structure is found: subject/x.edf, where subject is the subject’s code, and x is any of the following { empathic, scientisst_chest, scientisst_forearm }. Each file includes the signals recorded from every device after which the file is named, independently of the modality.
Notes:
- Original sampling frequencies are maintained.
- Original units are maintained.
- Signal is NaN during recording interruptions.
- Events are in EDF annotations.
- Biosignal and patient notes are not maintained.
The signals can be quickly plotted with: x.plot(). Make sure you have interactive Matplotlib activated. At first, you might have to decrease the scaling in order to correctly inspect them. tip: use the minus (-) kay in your keyboard as many times as necessary to reduce the scaling.