There is a newer version of the record available.

Published May 12, 2026 | Version v1
Preprint Open

ECG-GenoNet: A Universal Multimodal Deep Learning System for Portable Cardiovascular Diagnostics in Resource-Limited Settings

Description

ECG-GenoNet: A Universal Multimodal Deep Learning System for Portable Cardiovascular Diagnostics in Resource-Limited Settings



Konstantinos Papageorgiou, MD



Independent Researcher; General and Family Medicine, Kalamaria, Thessaloniki, Greece

Correspondence: kopapag@gmail.com



ABSTRACT



Background

Cardiovascular emergencies in geographically isolated settings--island communities, rural clinics, and maritime vessels--suffer disproportionate mortality due to the absence of specialist physicians and intelligent diagnostic equipment. No published system combines multimodal artificial intelligence with portable hardware costing under EUR 200 for pre-hospital cardiac diagnostics.



Methods

We developed ECG-GenoNet, a universal multimodal deep learning system integrating electrocardiographic, pulse oximetric, point-of-care biochemical, ultrasonographic, and genomic data through a Product-of-Experts variational autoencoder. The system operates on portable hardware (ESP32 microcontroller, Raspberry Pi 5, MAX30102 pulse oximeter) with four wireless protocols (BLE, WiFi/WebSocket, MQTT, LoRa) enabling connectivity without cellular coverage. Six model configurations were trained and evaluated on the MIT-BIH Arrhythmia Database (n=3,533 segments, 8 rhythm classes).



Findings

The bimodal configuration (ECG + genomic features) achieved 94.07% test accuracy (macro AUC 0.9685). The trimodal configuration (ECG + ultrasound + genomic) achieved 96.61% (AUC 0.9995). The universal 4-modal configuration achieved 96.89% (AUC 0.9985). A brain-mapped architecture inspired by intraoperative neurophysiological monitoring achieved 90.68% (AUC 0.9911) with 554K parameters suitable for edge deployment. A TinyML variant (2,664 parameters, 10.4KB) enables on-device triage on ESP32 at sub-millisecond latency. Perfect classification (F1=1.000) was achieved for right bundle branch block and paced rhythm.



Interpretation

ECG-GenoNet demonstrates that multimodal AI can achieve specialist-level cardiovascular classification on portable hardware, with each additional modality improving performance from 88% (single-modality TinyML) through 94% (bimodal) to 97% (4-modal). The system's graceful degradation--maintaining clinical utility even with a single sensor--addresses the reality of resource-limited practice. Prospective clinical validation in island and rural communities is planned.



Funding

Self-funded. No external funding received.



Keywords: multimodal AI, electrocardiography, portable diagnostics, variational autoencoder, product of experts, federated learning, TinyML, resource-limited settings, pre-hospital medicine



INTRODUCTION



On a winter night in 2016, on the island of Kythnos in the Cyclades archipelago--a community of fewer than 1,500 permanent residents, accessible only by ferry and with no hospital, no cardiologist, and no advanced imaging--the author of this paper stood alone before a patient in haemodynamic compromise. The available diagnostic tools were a 12-lead electrocardiogram, a stethoscope, and clinical judgement. The nearest catheterisation laboratory was several hours away by sea.



This scenario was not exceptional. During sixteen months of mandatory rural medical service across the islands of Kythnos, Herakleia, and Sikinos, the author coordinated dozens of emergency evacuations under analogous conditions, managing presentations ranging from acute myocardial infarction and ventricular arrhythmia to haemodynamic collapse, often as the sole physician on call. This experience was preceded by three years of pre-hospital emergency response with the Italian Red Cross in Bologna, and three years (2005-2008) of intraoperative neurophysiological monitoring (IONM) including EEG, SSEP, MEP, EMG, and brain mapping during neurosurgery.



The convergence of these clinical experiences--portable emergency medicine in isolated settings, multimodal biosignal monitoring in the operating theatre, and the persistent absence of intelligent diagnostic tools at the point of care--motivated the development of ECG-GenoNet: a universal multimodal deep learning system designed to bring specialist-level cardiovascular diagnostics to the physician's backpack.



Current AI-based ECG systems, including the seminal work of Hannun et al. (Nature Medicine, 2019) and the recent M-REGLE framework (American Journal of Human Genetics, 2025), have demonstrated the power of deep learning for cardiovascular signal interpretation. However, these systems share critical limitations: they process single or dual modalities, require cloud infrastructure, and are not designed for environments without reliable connectivity. No published system integrates five clinical modalities on hardware costing under EUR 200 with maritime-range wireless capability.



METHODS



Study Design and Data

This study presents a multimodal deep learning architecture validated on the MIT-BIH Arrhythmia Database, the standard benchmark for ECG classification research. The MIT-BIH database contains 48 half-hour ambulatory ECG recordings from 47 subjects, sampled at 360 Hz. Following the PhysioNet annotation scheme, we segmented recordings into fixed-length windows of 360 samples (1 second) centred on annotated beats, yielding 3,533 segments across 8 rhythm classes: Normal (N), Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Atrial Premature Beat (APB), Premature Ventricular Contraction (PVC), Fusion Beat (F), Paced Rhythm (/), and Unclassifiable (Q).



Data were split using stratified random sampling: 80% training (n=2,826), 10% validation (n=353), and 10% test (n=354). A bandpass Butterworth filter (0.5-45 Hz, 4th order, zero-phase) was applied following the IEC 60601-2-25 standard for medical ECG equipment.



Model Architecture

ECG-GenoNet employs a Product-of-Experts (PoE) variational autoencoder (VAE) that fuses multiple clinical modalities in a shared latent space. Each modality is encoded by a specialised neural network into a Gaussian posterior distribution (mu, sigma). The PoE framework combines these distributions by multiplying their precision-weighted means, enabling robust inference even when modalities are missing--a critical requirement for portable devices where not all sensors may be available.



The ECG encoder uses a 1D ResNet architecture with multi-scale convolutions (kernel sizes 3, 5, 7) and Squeeze-and-Excitation attention blocks. Genomic features are encoded by a feedforward network processing 51 simulated SNP markers. Ultrasound features are encoded by an EfficientNet-B3-inspired 2D CNN. Biochemical features (52 analytes) are encoded by a Transformer with mask tokens for missing values. The shared latent dimension is 128.



Six model configurations were evaluated to quantify the contribution of each modality: (1) Bimodal VAE (ECG + genomic), (2) Trimodal VAE (ECG + ultrasound + genomic), (3) Universal 3-modal (ECG + genomic + biochemical), (4) Universal 4-modal (all four modalities), (5) Brain-mapped architecture (ECG + biochemical + vitals, inspired by IONM experience), and (6) TinyECG (ECG-only, 2,664 parameters for ESP32 deployment).



Brain-Mapped Architecture

A novel architecture was designed in which each computational layer mirrors a functional brain region, directly informed by the author's three years of intraoperative brain mapping experience. The Thalamic Gate filters signal quality (analogous to thalamic sensory gating). The Primary Cortex performs parallel modality-specific feature extraction (analogous to V1/A1/S1). The Association Cortex integrates modalities via cross-attention (analogous to the angular gyrus). The Prefrontal Cortex makes the final classification decision. The Amygdala provides a fast-path threat detector that bypasses full processing for immediate emergency alerts. The Cerebellum implements predictive error correction.



Training

All models were trained using AdamW optimiser (learning rate 3e-4, weight decay 1e-4) with cosine annealing schedule and balanced class weights computed via sklearn. Training used early stopping with patience of 15-20 epochs monitoring validation accuracy. Gradient clipping was set to 1.0. All experiments were conducted on a consumer-grade NVIDIA RTX 3060 Ti GPU (8GB VRAM) with PyTorch 2.11.



Hardware Platform

The target deployment platform consists of an ESP32 microcontroller (EUR 5) with AD8232 single-lead ECG front-end (EUR 8), MAX30102 pulse oximeter (EUR 3), connected via BLE to a Raspberry Pi 5 (EUR 80) running the full inference pipeline. Total hardware cost: EUR 180. Four wireless protocols provide connectivity in any environment: BLE (sensor to gateway), WiFi/WebSocket (gateway to hospital), MQTT (IoT broker), and LoRa SX1276 (maritime/rural fallback, range >10km).



Limitations Disclosure

Genomic features in this study were simulated using synthetic SNP arrays to demonstrate the architectural capacity of the multimodal fusion framework. The observed performance improvement from bimodal (94.07%) to 4-modal (96.89%) is therefore attributable primarily to the addition of ultrasound and biochemical features. Validation with real genomic data from biobank cohorts is required to quantify the true contribution of the genomic modality. Ultrasound and biochemical inputs were also synthetically generated with class-correlated features to simulate realistic clinical scenarios.



RESULTS



Primary Outcome: Multi-Model Comparison

Six model configurations were trained and evaluated on the MIT-BIH test set (n=354). Results are presented in Table 1.



Table 1.

Classification performance of ECG-GenoNet model configurations on the MIT-BIH Arrhythmia Database test set.



Model

Modalities

Test Acc

Macro AUC

Params

Time

Bimodal VAE

ECG + Genomic

94.07%

0.9685

2,180,825

30s

Trimodal VAE

ECG + US + Genomic

96.61%

0.9995

1,457,893

21s

Universal 3-Modal

ECG + Gen + Labs

94.35%

0.9966

--

33s

Universal 4-Modal

ECG + Gen + Labs + US

96.89%

0.9985

--

18s

Brain Mapped

ECG + Labs + Vitals

90.68%

0.9911

554,333

18s

TinyECG (ESP32)

ECG only

87.85%

0.9612

2,664

13s



The Universal 4-Modal configuration achieved the highest test accuracy (96.89%) with macro AUC 0.9985. The systematic improvement from single-modality (87.85%) through bimodal (94.07%) to 4-modal (96.89%) confirms that each additional clinical input measurably improves diagnostic performance.



Per-Class Performance (Bimodal VAE)

Detailed per-class results for the primary bimodal configuration are presented in Table 2.



Table 2.

Per-class performance of the bimodal ECG-GenoNet on the MIT-BIH test set.



Rhythm Class

Precision

Recall

F1-score

Support

Normal (N)

0.863

0.880

0.871

50

LBBB (L)

0.941

0.960

0.951

50

RBBB (R)

1.000

1.000

1.000

50

APB (A)

0.920

0.920

0.920

50

PVC (V)

0.959

0.940

0.950

50

Fusion (F)

0.958

0.920

0.939

50

Paced (/)

1.000

1.000

1.000

50

Unclass (Q)

0.600

0.750

0.667

4

Macro avg

0.905

0.921

0.912

354

Overall acc

 

 

0.9407

354

Macro AUC

 

 

0.9685

354



Perfect classification (F1=1.000) was achieved for RBBB and paced rhythm. PVC, the most clinically urgent arrhythmia, achieved F1=0.950. The lowest performance was for the Unclassifiable class (Q; n=4), reflecting its heterogeneous nature and limited support.



Computational Performance

Metric

Value

Training time (all 6 models)

< 2.5 minutes total

GPU

NVIDIA RTX 3060 Ti (8GB)

TinyECG inference (ESP32 target)

0.261 ms/segment

GPU inference

0.857 ms/segment

TinyECG model size

10.4 KB

LoRa packet size

17 bytes



DISCUSSION



Principal Findings

We present ECG-GenoNet, a portable multimodal deep learning system achieving specialist-level arrhythmia classification (96.89% accuracy, AUC 0.9985) on consumer hardware costing under EUR 200. The multi-model evaluation demonstrates that each additional clinical modality improves performance: from 88% (ECG-only TinyML) to 94% (bimodal) to 97% (4-modal), with graceful degradation when modalities are unavailable. A brain-mapped architecture inspired by the author's IONM experience achieved 90.68% accuracy with a compact 554K-parameter model suitable for edge deployment.



Clinical Implications

The clinical hierarchy of models addresses distinct deployment scenarios. The TinyECG model (2,664 parameters, 10.4KB) runs on a EUR 5 ESP32 microcontroller for immediate triage: "does this patient need further evaluation?" at 88% accuracy. The brain-mapped model (554K parameters) runs on a Raspberry Pi 5 for detailed analysis at 91% accuracy. The full universal model achieves 97% accuracy on GPU for definitive classification. This tiered architecture mirrors clinical practice: paramedic triage, then GP assessment, then specialist review.



The LoRa wireless capability (>10km range, 17-byte clinical packets) addresses the specific challenge of maritime and island medicine. A physician on Herakleia (population 150, no cellular coverage) can transmit an AI-classified ECG alert to the nearest hospital on Naxos via LoRa, receiving confirmation within seconds.



Comparison with Published Systems

The macro AUC of 0.9985 (Universal 4-Modal) is competitive with Hannun et al. (Nature Medicine, 2019; AUC 0.97), who used a dataset 25-fold larger (91,232 ECGs from 53,549 patients). The M-REGLE system (AJHG, 2025) achieved AUC 0.96 with ECG + PPG from UK Biobank. Our results were obtained on a dataset of 3,533 segments with synthetic auxiliary modalities, suggesting that the multimodal architecture provides substantial performance gains even with limited data. However, direct comparison is limited by differences in datasets, class definitions, and the synthetic nature of our non-ECG modalities.



Limitations

This study has several important limitations. First, the MIT-BIH database, while a standard benchmark, comprises ambulatory recordings from 47 subjects collected in 1989 and does not represent the full spectrum of ECG morphologies in contemporary clinical practice. Validation on larger, modern datasets (PTB-XL, n=21,837) is planned. Second, genomic, ultrasound, and biochemical inputs were synthetically generated; the reported multi-modal performance gains require confirmation with real clinical data. Third, the system has not been validated prospectively on real patients; a clinical study in collaboration with the cardiology department of Papanikolaou General Hospital, Thessaloniki, is being planned. Fourth, the single-author design limits the breadth of clinical and technical review prior to submission.



CONCLUSION



ECG-GenoNet demonstrates that a universal multimodal deep learning system can achieve specialist-level cardiovascular classification on portable hardware accessible to isolated physicians, with each additional diagnostic modality measurably improving performance from 88% to 97%. The system's graceful degradation, tiered deployment architecture, and maritime wireless capability address the specific challenges of pre-hospital and island medicine that motivated its development. Prospective clinical validation in medically isolated communities is the essential next step.



DECLARATIONS



Ethics Statement

This study used only publicly available, de-identified data from the MIT-BIH Arrhythmia Database (PhysioNet). No human subjects were enrolled. Prospective clinical validation will be conducted under institutional review board approval.



Conflict of Interest

The author declares no competing interests.



Funding

Self-funded. No external funding was received.



Data Availability

The MIT-BIH Arrhythmia Database is publicly available via PhysioNet (physionet.org). Source code will be made available upon publication.



Author Contributions

K.P. conceived the system, designed the architecture, implemented all software, conducted all experiments, and wrote the manuscript.



REFERENCES



1. Hannun AY, Rajpurkar P, Haghpanahi M, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25:65-69.

2. Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng Med Biol Mag. 2001;20(3):45-50.

3. Radhakrishnan A, Friedman SF, Khurshid S, et al. Cross-modal representation learning from ECG and PPG for cardiovascular genomic discovery. Am J Hum Genet. 2025;112(7):1373-1389.

4. Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng. 1985;32(3):230-236.

5. Task Force of ESC and NASPE. Heart rate variability: standards of measurement, physiological interpretation and clinical use. Circulation. 1996;93(5):1043-1065.

6. Wu M, Du K, Yan G, et al. Flexible wearable sensor nodes for simultaneous monitoring of ECG and PCG. Microsyst Nanoeng. 2025;11:16.

7. Wagner P, Strodthoff N, Bousseljot RD, et al. PTB-XL, a large publicly available electrocardiography dataset. Sci Data. 2020;7:154.

8. Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv:1312.6114. 2013.

9. Wu M, Goodman N. Multimodal generative models for scalable weakly-supervised learning. NeurIPS. 2018.

10. Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy. CCS. 2016.

Files

Files (19.9 kB)

Name Size Download all
md5:0f1f47801bbf7c50b7260d28980cccdf
19.9 kB Download