Published November 17, 2025 | Version v1
Publication Open

Text-based approach for detecting cases of ADEs from EHRs of participants of the Estonian Biobank

Description

This study develops methods for detecting Adverse Drug Events (ADEs) from Electronic Health Records (EHR) of participants of the Estonian Biobank to support pharmacogenetics research. It focuses on creating manually annotated datasets and improving ADE detection efficiency by combining rule-based and machine learning (ML) approaches, specifically applied to antidepressants and antipsychotics.
To detect potential ADE mentions within free text fields of EHRs, we employed a lexicon-based approach to extract text snippets containing both a drug name and a symptom. We developed a rule-based and an ML-based system to prefilter the extracted text snippets, aiming to reduce the number of non-ADE snippets going into manual annotation. We then applied both systems before manual annotation and assessed their impact on the annotation process.
We produced annotated datasets for antidepressants (520 patient–drug pairs) and antipsychotics (1,329 pairs). Our prefiltering method reduced the annotation workload up to 24-fold compared to no filtering, and ML-based filtering outperformed rule-based filtering, requiring only 1.3–1.5 snippets per positive ADE case. Pharmacogenetic validation revealed significant genotype — ADE associations for Escitalopram, Sertraline, and Quetiapine.
The implementation of prefiltering methods significantly enhanced the efficiency of manual annotation for dataset creation and pharmacogenetic validation confirmed the datasets’ utility and usability. Therefore, we showed that ADE extraction from free text adds value by expanding the scope and diversity of analyzable cases for discoveries in the genetics of drug response.

Files

ADEfromEHRdatabasecopy.pdf

Files (1.7 MB)

Name Size Download all
md5:9f1fbf183bf45834c454e82fc2917cfc
1.7 MB Preview Download

Additional details

Funding

European Commission
SafePolyMed - Improve Safety in Polymedication by Managing Drug-Drug-Gene Interactions 101057639
Estonian Research Council
Next-generation pharmacogenomics: systematic integration of genetics, physiology, and drug-drug interactions PRG2625
Estonian Research Council
Discovery and Analysis of Clinical Pathways in Health Data PRG1844
Estonian Research Council
Neural text analysis models enhanced with external linguistic resources PSG721
Swedish Research Council
Genetic Precision Medicine 2021-02732