Published April 11, 2025 | Version v2
Dataset Open

Adverse Drug Reaction (ADR) Text Dataset

  • 1. University of Dodoma
  • 2. ROR icon Shibaura Institute of Technology

Description

This repository contains text data and code related to the identification and clustering of Adverse Drug Reactions (ADR) using Sentence-BERT (S-BERT) embeddings and the SS-DBSCAN clustering algorithm. The dataset includes both labeled and unlabeled patient reports extracted from the publicly available MIMIC-III database.

The labeled data has been manually annotated to distinguish between ADR and non-ADR cases. The unlabeled dataset is used for unsupervised clustering experiments, particularly to assess high-dimensional data clustering performance.

New in This Version:
- Added Jupyter Notebook: `mimic-5k_PCA_tSNE_clustering.ipynb`
- Included detailed `README_ADR_Clustering_Task.txt` with step-by-step instructions to reproduce clustering results
- Explained how to scale experiments from 1,000 to full dataset size

Files

adr_filtered.csv

Files (23.1 MB)

Name Size Download all
md5:c80ef52b48bd9879aa173252cabed7cb
20.2 MB Preview Download
md5:d0aede0ba8c0a9b0639bc24306600a33
3.0 MB Preview Download
md5:b78c5cac4bc48e5b9e46dc78c0cf552d
2.5 kB Preview Download

Additional details

Related works

Dates

Submitted
2025-04-11

Software

Programming language
Python