Adverse Drug Reaction (ADR) Text Dataset
Creators
Description
This repository contains text data and code related to the identification and clustering of Adverse Drug Reactions (ADR) using Sentence-BERT (S-BERT) embeddings and the SS-DBSCAN clustering algorithm. The dataset includes both labeled and unlabeled patient reports extracted from the publicly available MIMIC-III database.
The labeled data has been manually annotated to distinguish between ADR and non-ADR cases. The unlabeled dataset is used for unsupervised clustering experiments, particularly to assess high-dimensional data clustering performance.
New in This Version:
- Added Jupyter Notebook: `mimic-5k_PCA_tSNE_clustering.ipynb`
- Included detailed `README_ADR_Clustering_Task.txt` with step-by-step instructions to reproduce clustering results
- Explained how to scale experiments from 1,000 to full dataset size
Files
adr_filtered.csv
Additional details
Related works
- Is part of
- Dataset: https://physionet.org/content/mimiciii-demo/1.4/ (URL)
Dates
- Submitted
-
2025-04-11