Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published November 4, 2022 | Version v1.0
Dataset Open

MedalCare-XL

  • 1. Division of Biophysics, Medical University of Graz, Graz, Austria
  • 2. Institute of Biomedical Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
  • 3. Physikalisch-Technische Bundesanstalt, National Metrology Institute, Berlin, Germany
  • 4. University of Edinburgh, Edinburgh, United Kingdom

Description

Mechanistic cardiac electrophysiology models allow for personalized simulations of the electrical activity in the heart and the ensuing electrocardiogram (ECG) on the body surface. As such, synthetic signals possess precisely known ground truth labels of the underlying disease (model parameterization) and can be employed for validation of machine learning ECG analysis tools in addition to clinical signals. Recently, synthetic ECG signals were used to enrich sparse clinical data for machine learning or even replace them completely during training leading to good performance on real-world clinical test data.
    
We thus generated a large synthetic database comprising a total of 16,900 12-lead ECGs based on multi-scale electrophysiological simulations equally distributed into 1 normal healthy control and 7 pathology classes. The pathological case of myocardial infraction had 6 sub-classes.  A comparison of extracted timing and amplitude features between the virtual cohort and a large publicly available clinical ECG database demonstrated that the synthetic signals represent clinical ECGs for healthy and pathological subpopulations with high fidelity. The novel dataset of simulated ECG signals is split into training, validation and test data folds for development of novel machine learning algorithms and their objective assessment. 

This folder WP2_largeDataset_Noise contains the 12-lead ECGs of 10 seconds length. Each ECG is stored in a separate CSV file with one row per lead (lead order: I, II, III, aVR, aVL, aVF, V1-V6) and one sample per column (sampling rate: 500Hz). Data are split by pathologies (avblock = AV block, lbbb = left bundle branch block, rbbb = right bundle branch block, sinus = normal sinus rhythm, lae = left atrial enlargement, fam = fibrotic atrial cardiomyopathy, iab = interatrial conduction block, mi = myocardial infarction). MI data are further split into subclasses depending on the occlusion site (LAD, LCX, RCA) and transmurality (0.3 or 1.0). Each pathology subclass contains training, validation and testing data (~ 70/15/15 split). Training, validation and testing datasets were defined according to the model with which QRST complexes were simulated, i.e., ECGs calculated with the same anatomical model but different electrophysiological parameters are only present in one of the test, validation and training datasets but never in multiple. Each subfolder also contains a "siginfo.csv" file specifying the respective simulation run for the P wave and the QRST segment that was used to synthesize the 10 second ECG segment. Each signal is available in three variations:
run_*_raw.csv contains the synthesized ECG without added noise and without filtering
run_*_noise.csv contains the synthesized ECG (unfiltered) with superimposed noise
run_*_filtered.csv contains the filtered synthesized ECG (fiter settings: highpass cutoff frequency 0.5Hz, lowpass cutoff frequency 150Hz, butterworth filters of order 3).

The folder WP2_largeDataset_ParameterFiles contains the parameter files used to simulate the 12-lead ECGs. Parameters are split for atrial and ventricular simulations, which were run independently from one another. 
See Gillette*, Gsell*, Nagel* et al. "MedalCare-XL: 16,900 healthy and pathological electrocardiograms obtained through multi-scale electrophysiological models" for a description of the model parameters.

Notes

This work was supported by the EMPIR programme co-financed by the participating states and from the European Union's Horizon 2020 research and innovation programme under grant MedalCare 18HLT07. The authors also acknowledge the support of the British Heart Foundation Centre for Research Excellence Award III (RE/18/5/34216). SEW is supported by the British Heart Foundation (FS/20/26/34952).

Files

MedalCare-XL.zip

Files (9.3 GB)

Name Size Download all
md5:b88cc018995dc3aee5de895f2656ba27
9.3 GB Preview Download

Additional details

Related works

Is documented by
Preprint: 10.48550/arXiv.2211.15997 (DOI)