AI-Based Pharmacovigilance Risk Detection Using FDA FAERS Data
Contributors
Supervisor:
Description
AI-Based Pharmacovigilance Risk Detection Using FDA FAERS Data
ABSTRACT
An AI-driven pharmacovigilance workflow using FDA FAERS (Adverse Event Reporting System) data. The objective is to identify and analyze drug–adverse event relationships and classify risk levels using data-driven techniques.
The dataset was preprocessed by merging drug, reaction, and demographic tables, followed by filtering for primary suspect drugs and removing non-clinical reporting terms. Drug–reaction pairs were aggregated and assigned risk levels using percentile-based classification to address class imbalance.
A machine learning model (Random Forest) was trained to predict risk levels based on drug, reaction, and reporting frequency. The model was evaluated using train-test split validation and classification metrics.
An interactive Streamlit dashboard was developed to visualize top drugs, adverse reactions, high-risk signals, and enable real-time risk prediction.
This project demonstrates practical application of healthcare data science, pharmacovigilance analytics, and machine learning in drug safety monitoring.
OBJECTIVES
• To analyze FAERS data and identify drug–adverse event patterns
• To classify risk levels using data-driven percentile methods
• To build a machine learning model for risk prediction
• To develop an interactive dashboard for data exploration
• To demonstrate pharmacovigilance analytics using real-world data
METHODOLOGY
The workflow consists of multiple stages. First, FAERS datasets (DEMO, DRUG, REAC) were merged using primary identifiers. A sample dataset was created for efficient processing.
Primary suspect drugs were filtered using the role_cod field. Non-clinical reporting terms such as “off label use” were removed to ensure meaningful analysis.
Drug–reaction pairs were grouped and frequency counts were computed. Risk levels were assigned using percentile-based thresholds to ensure balanced classification across LOW, MEDIUM, and HIGH categories.
Categorical variables such as drug names and reactions were encoded using Label Encoding. A Random Forest classifier was trained using drug, reaction, and frequency as features.
The model was evaluated using train-test split and standard classification metrics. Finally, a Streamlit dashboard was built to visualize insights and enable real-time prediction.
DATA SOURCE
Data Source: FDA FAERS (Adverse Event Reporting System)
Files used:
• DEMO – patient demographic data
• DRUG – drug information and role
• REAC – reported adverse reactions
The dataset contains real-world adverse event reports used for pharmacovigilance and drug safety monitoring.
• drugname – name of the drug
• pt – preferred term (adverse reaction)
• count – frequency of reports
• risk_level – derived classification (LOW, MEDIUM, HIGH)
MACHINE LEARNING MODEL
Model: Random Forest Classifier
Features:
• Encoded drug name
• Encoded reaction (pt)
• Count (frequency)
Target:
• Risk level (LOW, MEDIUM, HIGH)
Evaluation:
• Train-test split (80/20)
• Accuracy, classification report, confusion matrix
Note: Initial model showed biased accuracy due to class imbalance, which was resolved using percentile-based risk classification.
DASHBOARD DESCRIPTION
An interactive Streamlit dashboard was developed to visualize pharmacovigilance insights.
Key features:
• Top drugs visualization
• Top adverse reactions visualization
• Risk-based filtering
• Drug search functionality
• High-risk drug–reaction pairs
• CSV download option
• AI-based risk prediction module
The dashboard enables users to explore safety signals and perform real-time risk prediction.
LIMITATIONS
• FAERS data is based on voluntary reporting and may contain bias
• Risk classification is based on frequency, not clinical severity
• Model performance depends on data distribution
• External validation was not performed
FUTURE WORK
• Incorporate patient-level features (age, gender)
• Use temporal trends for risk prediction
• Apply deep learning models
• Integrate real-time FAERS updates
• Deploy dashboard as a web application
Files
faers_ml_dataset.csv
Files
(36.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:476cb8c10e96864b544af47e16d7073d
|
6.5 kB | Download |
|
md5:31daa19506d141bf41a239c06633d2ae
|
1.2 MB | Preview Download |
|
md5:7e5060a5a18e36878b4b4bd3f66fbf8f
|
908.0 kB | Preview Download |
|
md5:857ea84e40acd911ec9301ffb60389fb
|
33.0 MB | Preview Download |
|
md5:3d4b96054ab067f48fb62469a39a2b50
|
33.0 kB | Preview Download |
|
md5:a9c331956f6b20c2aaafc5bffcb81961
|
304.6 kB | Preview Download |
|
md5:b17632cefdde8ca9452451f502dc6b93
|
297.9 kB | Preview Download |
|
md5:7b34abd5fab868ca87ffb4b93e6d51ed
|
234.9 kB | Preview Download |
|
md5:ec3291c6ff8f6a2332689484b1cfc571
|
179.0 kB | Preview Download |
|
md5:96ea7114d9d369fd88a833762a0936c3
|
193.3 kB | Preview Download |
Additional details
Software
- Programming language
- Python
References
- European Medicines Agency. (n.d.). Pharmacovigilance: Overview. Retrieved from https://www.ema.europa.eu
- U.S. Food and Drug Administration. (n.d.). FDA Adverse Event Reporting System (FAERS) public dashboard. Retrieved from https://fis.fda.gov/sense/app/777e9f4d-0cf8-448e-8068-5f0c7c52e1f8/sheet/7a47a261-d58b-4203-a8aa-6d3021737452/state/analysis
- Streamlit Inc. (n.d.). Streamlit documentation. Retrieved from https://docs.streamlit.io