Phishing and Benign Websites Dataset

Mowar, Peya; Jain, Mini

doi:10.5281/zenodo.5807622

Published December 28, 2021 | Version v1

Conference paper Open

Phishing and Benign Websites Dataset

1. Delhi Technological University

Contributors

Contact person (2):

1. Delhi Technological University

This dataset was compiled by Peya Mowar and Mini Jain. We are releasing this dataset for the research community.

Reference Paper:

P. Mowar and M. Jain, "Fishing out the Phishing Websites," 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021, pp. 1-6, doi: 10.1109/CyberSA52016.2021.9478237.

Abstract:

Phishing is a cybercrime in which deceitful websites lure naive users and trick them into disclosing confidential information, such as social media passwords or financial data. Phishing websites are crafted such that they superficially appear similar to popular legitimate websites. This paper aims to detect such phishing websites by proposing a novel classifier that takes lexical-based, script-based, rule-based, and address-based features extracted from a website into account. A large-scale balanced dataset of 38,800 active phishing and legitimate websites is created, on which tree-based ensemble classifiers are trained, out of which the XGBoost (eXtreme Gradient Boosting) model performs the best with a testing accuracy of 99.6%. The classifier can detect zero-day phishing attacks without requiring any third- party features such as page rank. Several other benefits of using this model over the state-of-the-art techniques are discussed.

Files

phishing_and_benign_websites.csv

Files (3.2 MB)

Name	Size	Download all
phishing_and_benign_websites.csv md5:20c217346de248aab25b5b8def454525	3.2 MB	Preview Download

Additional details

Compiles: Conference paper: 10.1109/CyberSA52016.2021.9478237 (DOI)

	All versions	This version
Views	1,260	1,253
Downloads	847	840
Data volume	3.6 GB	3.6 GB

Phishing and Benign Websites Dataset

Authors/Creators

Contributors

Contact person (2):

Description

Files

phishing_and_benign_websites.csv

Files (3.2 MB)

Additional details

Related works