There is a newer version of the record available.

Published August 14, 2025 | Version v2
Dataset Open

DiaData: An Integrated Large Dataset for Type 1 Diabetes and Hypoglycemia Research

  • 1. ROR icon Helmut Schmidt University

Contributors

Researcher:

  • 1. ROR icon Helmut Schmidt University

Description

Type 1 diabetes (T1D) is an autoimmune disorder that leads to the destruction of insulin-producing cells, as to why affected individuals depend on external insulin injections. However, insulin can cause low blood glucose levels of hypoglycemia (≤70 mg/dL), which is a severe event with dangerous side effects. Data analysis can significantly enhance diabetes care by identifying personal patterns and trends leading to adverse events. However, diabetes and hypoglycemia research is limited by the unavailability of large datasets. Thus, we pesent DiaData. DiaData integrates 13 different datasets and presents a large continuous glucose monitoring (CGM) dataset comprising data from individuals with T1D across various age groups. CGM data is reported every 5 minutes. The Maindatabase (MDB) contains CGM measurements of all 1720 subjects. From this, two subsets are extracted: Subdatabase I (SDBI) includes CGM data and demographics of age and sex for 1685 subjects, while Subdatabase II (SDBII) includes CGM and heart rate data for a subset of 51 subjects.
 
DiaData is provided in .csv format, where each row represents a single CGM measurement. The Maindatabase includes CGM data for all subjects, with the following columns: timestamp (ts), patient identifier (PtID), glucose value (GlucoseCGM), and the source database name (Database). Subdatabase I adds demographic information, including Age, AgeGroup, and Sex. Subdatabase II contains CGM data combined with heart rate (HR) measurements. 

This release presents the raw and preprocessed version of DiaData. The raw dataset has not undergone any cleaning or imputation procedures. For subjects using CGM devices with a sampling frequency higher than 5 minutes, the data were undersampled to 5-minute intervals. Missing values introduced by this undersampling were not imputed. In contrast, the preprocessed dataset incorporates quality enhancement steps. Outliers in the CGM and HR signals were removed using the interquartile range (IQR) method. Missing values were imputed with linear interpolation for a gap length of less than 30 minutes, and with Stineman interpolation for a gap length of 30 to 120 minutes.


The datasets used in this study were obtained from a variety of third-party sources. The code for data preprocessing and exploration can be found in https://github.com/Beyza-Cinar/DiaData.

The sources of the data are:
- the D1NAMO dataset (https://doi.org/10.5281/zenodo.5651217),
- the HUPA-UCM Diabetes Dataset (doi: 10.17632/3hbcscwz44.1),
- the Diabetes Adolescents Time Series with Heart Rate dataset (https://github.com/ictinnovaties-zorg/dataset-diabetes-adolescents-time-series-with-heart-rate/tree/main/data-csv),
- the ShanghaiT1DM dataset (https://doi.org/10.6084/m9.figshare.20444397.v3),
- the T1GDUJA dataset (https://doi.org/10.5281/zenodo.11284018),
- the CITY dataset (https://public.jaeb.org/dataset/565),
- the ReplaceBG dataset (https://public.jaeb.org/dataset/546),
- the RT-CGM dataset (https://public.jaeb.org/dataset/563),
- the DLCP3 dataset (https://public.jaeb.org/dataset/573),
- the SENCE dataset (https://public.jaeb.org/dataset/537),
- the Severe Hypoglycemia in Older Adults with Type 1 Diabetes dataset (https://public.jaeb.org/dataset/537),
- the WISDM dataset (https://public.jaeb.org/dataset/564),
- the PEDAP dataset (https://public.jaeb.org/dataset/599).

The sources of subsets of the data are the Barbara Davis Center, Jaeb Center for Health Research, Joslin Diabetes Center, T1D Exchange, University of Colorado, and University of Virginia. The analyses, content, and conclusions presented herein are solely the responsibility of the authors and have not been reviewed or approved by the before mentioned institutions.

Files

Preprocessed.zip

Files (1.5 GB)

Name Size Download all
md5:56154bfcb36b7eca93e105bbe40a68d6
773.0 MB Preview Download
md5:579d6bceb0b90e1557d67d592d2475f4
759.4 MB Preview Download

Additional details

Related works

Is referenced by
Preprint: arXiv:2511.02849 (arXiv)
Journal: 10.1109/JBHI.2025.3620603 (DOI)

References