Phishing validation emails dataset

Miltchev, Radoslav; Rangelov, Dimitar; Genchev, Evgeni

doi:10.5281/zenodo.13474746

Published August 29, 2024 | Version v1

Dataset Open

Phishing validation emails dataset

1. Technical University of Sofia
2. University of Twente
3. Saxion University of Applied Sciences

Description:
This dataset contains a collection of 2,000 emails, specifically curated for the purpose of validating machine learning models designed to differentiate between safe emails and phishing attempts. The dataset is a mix of real-world email samples and artificially generated emails, ensuring a comprehensive reflection of realistic email scenarios.

Each entry in the dataset includes the full text of an email and a corresponding label that categorizes the email as either 'Safe Email' or 'Phishing Email.' This dataset is intended for use in validating the performance of models after they have been trained, providing a crucial step in ensuring the model's accuracy and reliability before deployment.

Dataset Structure:

Total Emails: 2,000
Email Types:
- Safe Emails
- Phishing Emails
Attributes:
- Full text of the email
- Label indicating whether the email is safe or phishing

Example Entries:

Email Text: "Dear Jordan, your subscription has been succes..."
- Email Type: Safe Email
Email Text: "Congratulations! You've won a $3000 gift card...."
- Email Type: Phishing Email

Acknowledgments

The authors would like to thank Sofia Tech Park and the Artificial intelligence and CAD systems laboratory for their assistance and support in conducting this research.

Files

Phishing_validation_emails.csv

Files (203.1 kB)

Name	Size	Download all
Phishing_validation_emails.csv md5:1bf8ec0fe3f67e12dd275ce5b2b91b69	203.1 kB	Preview Download

	All versions	This version
Views	6,031	6,031
Downloads	5,509	5,509
Data volume	1.8 GB	1.8 GB

Phishing validation emails dataset

Authors/Creators

Description

Files

Phishing_validation_emails.csv

Files (203.1 kB)