Published August 29, 2024 | Version v1
Dataset Open

Phishing validation emails dataset

  • 1. ROR icon Technical University of Sofia
  • 2. ROR icon University of Twente
  • 3. ROR icon Saxion University of Applied Sciences

Description

Description:
This dataset contains a collection of 2,000 emails, specifically curated for the purpose of validating machine learning models designed to differentiate between safe emails and phishing attempts. The dataset is a mix of real-world email samples and artificially generated emails, ensuring a comprehensive reflection of realistic email scenarios.

Each entry in the dataset includes the full text of an email and a corresponding label that categorizes the email as either 'Safe Email' or 'Phishing Email.' This dataset is intended for use in validating the performance of models after they have been trained, providing a crucial step in ensuring the model's accuracy and reliability before deployment.

Dataset Structure:

  • Total Emails: 2,000
  • Email Types:
    • Safe Emails
    • Phishing Emails
  • Attributes:
    • Full text of the email
    • Label indicating whether the email is safe or phishing

Example Entries:

  1. Email Text: "Dear Jordan, your subscription has been succes..."
    • Email Type: Safe Email
  2. Email Text: "Congratulations! You've won a $3000 gift card...."
    • Email Type: Phishing Email
    •  

Acknowledgments

The authors would like to thank Sofia Tech Park and the Artificial intelligence and CAD systems laboratory for their assistance and support in conducting this research.

Files

Phishing_validation_emails.csv

Files (203.1 kB)

Name Size Download all
md5:1bf8ec0fe3f67e12dd275ce5b2b91b69
203.1 kB Preview Download