AI-Based Pharmacovigilance Risk Detection Using FDA FAERS Data

Sanjay, Nenavath

doi:10.5281/zenodo.19521242

Published April 12, 2026 | Version v1

Software Open

AI-Based Pharmacovigilance Risk Detection Using FDA FAERS Data

Sanjay, Nenavath (Other)¹

1. Saint Louis University

Contributors

Supervisor:

Melinda B Chu, M.D M.B.A

AI-Based Pharmacovigilance Risk Detection Using FDA FAERS Data

ABSTRACT

An AI-driven pharmacovigilance workflow using FDA FAERS (Adverse Event Reporting System) data. The objective is to identify and analyze drug–adverse event relationships and classify risk levels using data-driven techniques.

The dataset was preprocessed by merging drug, reaction, and demographic tables, followed by filtering for primary suspect drugs and removing non-clinical reporting terms. Drug–reaction pairs were aggregated and assigned risk levels using percentile-based classification to address class imbalance.

A machine learning model (Random Forest) was trained to predict risk levels based on drug, reaction, and reporting frequency. The model was evaluated using train-test split validation and classification metrics.

An interactive Streamlit dashboard was developed to visualize top drugs, adverse reactions, high-risk signals, and enable real-time risk prediction.

This project demonstrates practical application of healthcare data science, pharmacovigilance analytics, and machine learning in drug safety monitoring.

OBJECTIVES

• To analyze FAERS data and identify drug–adverse event patterns
• To classify risk levels using data-driven percentile methods
• To build a machine learning model for risk prediction
• To develop an interactive dashboard for data exploration
• To demonstrate pharmacovigilance analytics using real-world data

METHODOLOGY

The workflow consists of multiple stages. First, FAERS datasets (DEMO, DRUG, REAC) were merged using primary identifiers. A sample dataset was created for efficient processing.

Primary suspect drugs were filtered using the role_cod field. Non-clinical reporting terms such as “off label use” were removed to ensure meaningful analysis.

Drug–reaction pairs were grouped and frequency counts were computed. Risk levels were assigned using percentile-based thresholds to ensure balanced classification across LOW, MEDIUM, and HIGH categories.

Categorical variables such as drug names and reactions were encoded using Label Encoding. A Random Forest classifier was trained using drug, reaction, and frequency as features.

The model was evaluated using train-test split and standard classification metrics. Finally, a Streamlit dashboard was built to visualize insights and enable real-time prediction.

DATA SOURCE

Data Source: FDA FAERS (Adverse Event Reporting System)

Files used:
• DEMO – patient demographic data
• DRUG – drug information and role
• REAC – reported adverse reactions

The dataset contains real-world adverse event reports used for pharmacovigilance and drug safety monitoring.

• drugname – name of the drug
• pt – preferred term (adverse reaction)
• count – frequency of reports
• risk_level – derived classification (LOW, MEDIUM, HIGH)

MACHINE LEARNING MODEL

Model: Random Forest Classifier

Features:
• Encoded drug name
• Encoded reaction (pt)
• Count (frequency)

Target:
• Risk level (LOW, MEDIUM, HIGH)

Evaluation:
• Train-test split (80/20)
• Accuracy, classification report, confusion matrix

Note: Initial model showed biased accuracy due to class imbalance, which was resolved using percentile-based risk classification.

DASHBOARD DESCRIPTION

An interactive Streamlit dashboard was developed to visualize pharmacovigilance insights.

Key features:
• Top drugs visualization
• Top adverse reactions visualization
• Risk-based filtering
• Drug search functionality
• High-risk drug–reaction pairs
• CSV download option
• AI-based risk prediction module

The dashboard enables users to explore safety signals and perform real-time risk prediction.

LIMITATIONS

• FAERS data is based on voluntary reporting and may contain bias
• Risk classification is based on frequency, not clinical severity
• Model performance depends on data distribution
• External validation was not performed

FUTURE WORK

• Incorporate patient-level features (age, gender)
• Use temporal trends for risk prediction
• Apply deep learning models
• Integrate real-time FAERS updates
• Deploy dashboard as a web application

Files

faers_ml_dataset.csv

Files (36.4 MB)

Name	Size	Download all
app.py md5:476cb8c10e96864b544af47e16d7073d	6.5 kB	Download
faers_ml_dataset.csv md5:31daa19506d141bf41a239c06633d2ae	1.2 MB	Preview Download
faers_risk_scores.csv md5:7e5060a5a18e36878b4b4bd3f66fbf8f	908.0 kB	Preview Download
faers_sample.csv md5:857ea84e40acd911ec9301ffb60389fb	33.0 MB	Preview Download
FEARS_PHARMACO.ipynb md5:3d4b96054ab067f48fb62469a39a2b50	33.0 kB	Preview Download
Screenshot 2026-04-11 184724.png md5:a9c331956f6b20c2aaafc5bffcb81961	304.6 kB	Preview Download
Screenshot 2026-04-11 184818.png md5:b17632cefdde8ca9452451f502dc6b93	297.9 kB	Preview Download
Screenshot 2026-04-11 184833.png md5:7b34abd5fab868ca87ffb4b93e6d51ed	234.9 kB	Preview Download
Screenshot 2026-04-11 184853.png md5:ec3291c6ff8f6a2332689484b1cfc571	179.0 kB	Preview Download
Screenshot 2026-04-11 184938.png md5:96ea7114d9d369fd88a833762a0936c3	193.3 kB	Preview Download

Additional details

Programming language: Python

European Medicines Agency. (n.d.). Pharmacovigilance: Overview. Retrieved from https://www.ema.europa.eu
U.S. Food and Drug Administration. (n.d.). FDA Adverse Event Reporting System (FAERS) public dashboard. Retrieved from https://fis.fda.gov/sense/app/777e9f4d-0cf8-448e-8068-5f0c7c52e1f8/sheet/7a47a261-d58b-4203-a8aa-6d3021737452/state/analysis
Streamlit Inc. (n.d.). Streamlit documentation. Retrieved from https://docs.streamlit.io

	All versions	This version
Views	144	144
Downloads	101	101
Data volume	207.6 MB	207.6 MB

AI-Based Pharmacovigilance Risk Detection Using FDA FAERS Data

Authors/Creators

Contributors

Supervisor:

Description

Files

faers_ml_dataset.csv

Files (36.4 MB)

Additional details

Software

References