DiaData: An Integrated Large Dataset for Type 1 Diabetes and Hypoglycemia Research
Authors/Creators
Description
This release presents the raw and preprocessed version of DiaData. The raw dataset has not undergone any cleaning or imputation procedures. For subjects using CGM devices with a sampling frequency higher than 5 minutes, the data were undersampled to 5-minute intervals. Missing values introduced by this undersampling were not imputed. In contrast, the preprocessed dataset incorporates quality enhancement steps. Outliers in the CGM and HR signals were removed using the interquartile range (IQR) method. Missing values were imputed with linear interpolation for a gap length of less than 30 minutes, and with Stineman interpolation for a gap length of 30 to 120 minutes.
The datasets used in this study were obtained from a variety of third-party sources. The code for data preprocessing and exploration can be found in https://github.com/Beyza-Cinar/DiaData.
- the D1NAMO dataset (https://doi.org/10.5281/zenodo.5651217),
- the HUPA-UCM Diabetes Dataset (doi: 10.17632/3hbcscwz44.1),
- the Diabetes Adolescents Time Series with Heart Rate dataset (https://github.com/ictinnovaties-zorg/dataset-diabetes-adolescents-time-series-with-heart-rate/tree/main/data-csv),
- the ShanghaiT1DM dataset (https://doi.org/10.6084/m9.figshare.20444397.v3),
- the T1GDUJA dataset (https://doi.org/10.5281/zenodo.11284018),
- the CITY dataset (https://public.jaeb.org/dataset/565),
- the ReplaceBG dataset (https://public.jaeb.org/dataset/546),
- the RT-CGM dataset (https://public.jaeb.org/dataset/563),
- the DLCP3 dataset (https://public.jaeb.org/dataset/573),
- the SENCE dataset (https://public.jaeb.org/dataset/537),
- the Severe Hypoglycemia in Older Adults with Type 1 Diabetes dataset (https://public.jaeb.org/dataset/537),
- the WISDM dataset (https://public.jaeb.org/dataset/564),
- the PEDAP dataset (https://public.jaeb.org/dataset/599).
The sources of subsets of the data are the Barbara Davis Center, Jaeb Center for Health Research, Joslin Diabetes Center, T1D Exchange, University of Colorado, and University of Virginia. The analyses, content, and conclusions presented herein are solely the responsibility of the authors and have not been reviewed or approved by the before mentioned institutions.
Files
Preprocessed.zip
Files
(1.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:56154bfcb36b7eca93e105bbe40a68d6
|
773.0 MB | Preview Download |
|
md5:579d6bceb0b90e1557d67d592d2475f4
|
759.4 MB | Preview Download |
Additional details
Related works
- Is referenced by
- Preprint: arXiv:2511.02849 (arXiv)
- Journal: 10.1109/JBHI.2025.3620603 (DOI)
Software
- Repository URL
- https://github.com/Beyza-Cinar/Preprocessing-DiaData
- Programming language
- Python , Jupyter Notebook
References
- Hidalgo, J. I., Alvarado, J., Botella, M., Aramendi, A., Velasco, J. M., & Garnica, O. (2024). HUPA-UCM diabetes dataset (Version 1) [Data set]. Mendeley Data. https://doi.org/10.17632/3hbcscwz44.1
- Dubosson, F., Ranvier, J.-E., Bromuri, S., Calbimonte, J.-P., Ruiz, J., & Schumacher, M. (2018). The open D1NAMO dataset: A multi-modal dataset for research on non-invasive type 1 diabetes management (Version 1.2.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5651217
- Gaitán Guerrero, J. F., López Ruiz, J. L., Martínez Cruz, C., & Espinilla Estévez, M. (2024). T1GDUJA: Glucose dataset of a patient with type 1 diabetes mellitus [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11284018
- Jaeb Center for Health Research. (2025). Diabetes datasets – public data archive [Data set]. https://public.jaeb.org/datasets/diabetes (Accessed April 23, 2025)
- Zhu, J. (2022). Diabetes Datasets – ShanghaiT1DM and ShanghaiT2DM [Data set]. Figshare. https://doi.org/10.6084/m9.figshare.20444397.v3