Published February 5, 2021 | Version v1
Dataset Open

Wrist-mounted IMU data towards the investigation of free-living smoking behavior - the Smoking Event Detection (SED) and Free-living Smoking Event Detection (SED-FL) datasets

  • 1. Aristotle University of Thessaloniki

Description

Introduction

The Smoking Event Detection (SED) and the Free-living Smoking Event Detection (SED-FL) datasets were created by the Multimedia Understanding Group towards the investigation of smoking behavior, both while smoking and in-the-wild. Both datasets contain the triaxial acceleration and orientation velocity signals ( DoF) that originate from a commercial smartwatch (Mobvoi TicWatch E™). The SED dataset consists of \(20\) smoking sessions provided by \(11\) unique subjects, while the SED-FL dataset contains \(10\) all-day recordings provided by \(7\) unique subjects.

In addition, the start and end moments of each puff cycle are annotated throughout the SED dataset.

Description 

SED

A total of \(11\) subjects were recorded while smoking a cigarette at interior or exterior areas. The total duration of the \(20\) sessions sums up to \(161\) minutes, with a mean duration of \(8.08\) minutes. Each participant was free to smoke naturally, with the only limitation being to not swap the cigarette between hands during the smoking session. Prior to the recording, the participant was asked to wear the smartwatch to the hand that he typically uses in his everyday life to smoke. A camera was already set facing the participant, including at least the whole length of the arms in its field of view. The purpose of video recording was to obtain ground truth information for each of the puff cycles that occur during the smoking session. Participants were also asked to perform a clapping hand movement both at the start and end of the meal, for synchronization purposes (as this movement is distinctive in the accelerometer signal). No other instructions were given to the participants. It should be noted that the SED dataset does not contain instances of electronic cigarettes (also known as vaping devices), or heated tobacco products.

SED-FL

SED-FL includes \(10\) in-the-wild sessions that belong to \(7\) unique subjects. This is achieved by recording the subjects’ meals as a small part part of their everyday life, unscripted, activities. Participants were instructed to wear the smartwatch to the hand of their preference well ahead before any smoking session and continue to wear it throughout the day until the battery is depleted. In addition, we followed a self-report labeling model, meaning that the ground truth is provided from the participant by documenting the start and end moments of their smoking sessions to the best of their abilities as well as the hand they wear the smartwatch on. The total duration of the recordings sums up to \(78.3\) hours, with a mean duration of \(7.83\) hours.

For both datasets, the accompanying Python script read_dataset.py will visualize the IMU signals and ground truth for each of the recordings. Information on how to execute the Python scripts can be found below.

# The script and the daataset's pickle file must be located in the same directory.
# Tested with Python 3.6.4
# Requirements: Pandas, Pickle and Matplotlib

# Visualize signals and ground truth
python read_datasets.py

Annotation

For all recordings, we annotated the start and end points for each puff cycle (i.e., smoking gesture). The annotation process was performed in such a way that the start and end times of each smoking gesture do not overlap each other.

Technical details

SED

We provide the SED dataset as a pickle. The file can be loaded using Python in the following way:

import pickle as pkl
import pandas as pd

with open('./SED.pkl','rb') as fh:
    dataset = pkl.load(fh)

The dataset variable in the snippet above is a dictionary with keys, each corresponding to a unique subject (numbered from to ). It should be mentioned that the subject identifier in SED is in-line with the subject identifier in the SED-FL dataset; i.e., SED’s subject with id equal to is the same person as SED-FL’s subject with id equal to .

The content of a dataset ‘s subject is a list with length equal to corresponding subject’s number of recorded smoking sessions. For example, assuming that subject has recorded smoking sessions, the command:

sessions = dataset['8']

would yield a list of length equal to . Each member of the list is a Pandas DataFrame with dimensions , where is the length of the recording.

The columns of a session’s DataFrame are:

  • 'T':                  The timestamps in seconds
  • 'AccX':            The accelerometer measurements for the axis in \(m/s^2\)
  • 'AccY':            The accelerometer measurements for the axis in \(m/s^2\)
  • 'AccZ':            The accelerometer measurements for the axis in \(m/s^2\)
  • 'GyrX':            The gyroscope measurements for the axis in \(rad/s\)
  • 'GyrY':            The gyroscope measurements for the axis in \(rad/s\)
  • 'GyrZ':            The gyroscope measurements for the axis in \(rad/s\)
  • 'GT':                The manually annotated ground truth for puff cycles

The contents of this DataFrame are essentially the accelerometer and gyroscope sensor streams, resampled at a constant sampling rate of Hz and aligned with each other and with their puff cycle ground truth. All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the SED-FL dataset. The ground truth is a signal with value during puff cycles, and elsewhere.

No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "Modeling Wrist Micromovements to Measure In-Meal Eating Behavior from Inertial Sensor Data" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component).

SED-FL

Similar to SED, we provide the SED-FL dataset as a pickle. The file can be loaded using Python in the following way:

import pickle as pkl
import pandas as pd

with open('./SED-FL.pkl','rb') as fh:
    dataset = pkl.load(fh)

The dataset variable in the snippet above is a dictionary with keys, each corresponding to a unique subject. It should be mentioned that the subject identifier in SED-FL is in-line with the subject identifier in the SED dataset; i.e., SED-FL’s subject with id equal to  is the same person as SED’s subject with id equal to .

The content of a dataset ‘s subject is a list with length equal to corresponding subject’s number of recorded daily sessions. For example, assuming that subject has recorded 2 daily sessions, the command:

sessions = dataset['8']

would yield a list of length equal to \(2\). Each member of the list is a Pandas DataFrame with dimensions \(M \times 8\), where \(M\) is the length of the recording.

The columns of a session’s DataFrame are exactly the same with the ones in the SED dataset. However, the 'GT' column contains ground truth that relates with the smoking sessions during the day (instead of puff cycles in SED).

The contents of this DataFrame are essentially the accelerometer and gyroscope sensor streams, resampled at a constant sampling rate of \(50\) Hz and aligned with each other and with their smoking session ground truth. All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the SED dataset. The ground truth is a signal with value \(+1\) during smoking sessions, and \(-1\) elsewhere.

No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "Modeling Wrist Micromovements to Measure In-Meal Eating Behavior from Inertial Sensor Data" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component).

Ethics and funding

Informed consent, including permission for third-party access to anonymized data, was obtained from all subjects prior to their engagement in the study. The work leading to these results has received funding
from the EU Commission under Grant Agreement No. 965231, the REBECCA project (H2020).

Contact

Any inquiries regarding the SED and SED-FL datasets should be addressed to:

Mr. Konstantinos KYRITSIS (Electrical & Computer Engineer, PhD candidate)

Multimedia Understanding Group (MUG)
Department of Electrical & Computer Engineering
Aristotle University of Thessaloniki
University Campus, Building C, 3rd floor
Thessaloniki, Greece, GR54124

Tel: +30 2310 996359, 996365 
Fax: +30 2310 996398
E-mail: kokirits [at] mug [dot] ee [dot] auth [dot] gr

 

Files

Files (933.2 MB)

Name Size Download all
md5:4150be21d5389507588e103bd3bdd701
1.6 kB Download
md5:53031cdaa1ef63d2e115e0ee579acfc9
902.5 MB Download
md5:c95b95f6936c0a64e820e6b85c5451bc
30.7 MB Download

Additional details

Funding

European Commission
REBECCA - REsearch on BrEast Cancer induced chronic conditions supported by Causal Analysis of multi-source data 965231