Published July 31, 2025
| Version V1.0
Dataset
Open
Health Indices Dataset from Clinical Study A - TOLIFE Project
Creators
-
Tognetti, Alessandro
(Project leader)1
-
Di Rienzo, Francesco
(Data manager)1
- Torres, Manuel
- Segura, Víctor
- Rey, Víctor
- Rubio, David
- González, Sandra
- Colla, Eugenio
- Romano, Domenico
-
Zanoletti, Michele
- Melissa, Eleonora
-
Bufano, Pasquale
-
Rho, Gianluca2
-
Bossi, Francesco
-
Greco, Alberto2
-
Marinai, Carlotta
-
CARBONARO, Nicola2
-
Vallati, Carlo
- Garcia-Aymerich, Judith
- Vásquez, Roger
- Alcaraz, Victoria
- Buekers, Joren
- Wats, Henrik
- Abdo, Mustafa
- Velez, Oswaldo Antonio Caguana
- Guiral, Joaquin Gea
- Guardia, Sergi Pascual
- Jimenez, Patricia Abril
-
Laurino, Marco
(Project manager)3
Description
Health Indices Dataset from Clinical Study A - TOLIFE Project
This dataset has been developed in the context of the European HORIZON project TOLIFE (Combining Artificial Intelligence and smart sensing TOward better management and improved quality of LIFE in COPD, Grant Agreement No. 101057103). The project aims to enable continuous and unobtrusive health monitoring of patients affected by Chronic Obstructive Pulmonary Disease (COPD) by leveraging a multi-modal sensing infrastructure and AI-based data analytics.
In TOLIFE, two clinical study are planned, the first clinical study is called "Alinical Study A (CSA)". In CSA, group of patients will be followed for 12 months with specific TOLIFE sensor kit and by collecting periodic clinical examinations to provide the clinical references for the AI tools.
This dataset contains the sensor-derived health indices from first year of CSA, extracted from heterogeneous signals collected using the TOLIFE sensor kit. The kit integrates both commercial devices (smartphone, smartwatch, spirometer) and custom-made IoT devices (smart mattress cover, smart shoes, environmental sensing unit), designed to collect physiological, behavioral, and environmental data from patients during their everyday life. This dataset contains also the periodic clinical examinations of CSA and the exacerbations register.
To obatin the sensor-derived health indeces of CSA, data are processed through a hierarchical and modular analytics pipeline, which transforms raw signals from TOLIFE sensor kit into clinically meaningful health-related indicators. The first stage involves the extraction of primary features (e.g., gait speed, sleep duration, heart rate), while subsequent layers aggregate and interpret these features to produce higher-level indices for health assessment of the following health domains:
Mobility and Gait Analysis
Mobility-related indices are computed by combining sensor data from the smartphone, smartwatch, and smart shoes. Walking episodes are detected using lightweight machine learning models, and gait speed is estimated via a modular deep learning model capable of adapting to the number of available devices. From these models, several metrics are computed, including mean gait speed, step length, estimated six-minute walking distance, and total walked time and distance, aggregated at both 10-minute and daily levels.
Sleep and Environmental Indices
Night-time parameters are derived from the smart mattress cover and environmental unit. A dedicated algorithm classifies segments of the night into "off-bed", "on-bed with movement", and "on-bed still" phases. Based on these labels, the system estimates total sleep time (TST), wake after sleep onset (WASO), sleep efficiency, and the number of movement episodes. In parallel, heart rate and breathing rate during sleep are computed from pressure and inertial signals, using time-domain and spectral analysis methods. Environmental metrics, such as temperature, humidity, air quality, light and sound intensity, are recorded continuously and summarized through daily and hourly statistics.
Cardiac Function (PPG-based)
The smartwatch provides photoplethysmographic (PPG) signals used to estimate heart rate variability (HRV) markers (specifically, pulse rate variability, PRV). A deep-learning-based denoising pipeline, based on convolutional autoencoders (CNN-DAEs), is applied to remove motion artifacts. Peaks are detected from the cleaned signals to derive time-domain features such as stdRR and RMSSD, which reflect autonomic nervous system regulation.
Pulmonary Function and Clinical Scores
Respiratory function is assessed through a portable spirometer, used by patients at home. Key metrics such as FEV1 and PEF are extracted, and their daily mean and range are reported. Additionally, higher-level clinical scores—including the COPD Assessment Test (CAT), the Clinical COPD Questionnaire (CCQ), and the modified Medical Research Council dyspnea scale (mMRC)—are estimated via both data-driven and literature-informed models, using the previously computed indices as inputs.