Published February 17, 2025 | Version v1
Dataset Open

What's in Phishers: A Longitudinal Study of Security Configurations in Phishing Websites and Kits

  • 1. EDMO icon University of Tennessee Knoxville
  • 2. ROR icon University System of Maryland
  • 3. ROR icon Sungkyunkwan University
  • 4. ROR icon University of Tennessee System
  • 1. EDMO icon University of Tennessee Knoxville
  • 2. ROR icon University System of Maryland
  • 3. ROR icon Sungkyunkwan University
  • 4. ROR icon University of Tennessee System

Description

Phishing attacks continue to be a major threat to internet users, causing data breaches, financial losses, and identity theft. This study provides an in-depth analysis of the lifespan and evolution of phishing websites, focusing on their survival strategies and evasion techniques. We analyze 286,237 unique phishing URLs over five months using a custom web crawler based on Puppeteer and Chromium. Our crawler runs on a 30-minute cycle, systematically checking the operational status of phishing websites by collecting their HTTP status codes, screenshots, HTML, and HTTP data. Temporal and survival analyses, along with statistical tests, are used to examine phishing website lifecycles, evolution, and evasion tactics. Our findings show that the average lifespan of phishing websites is 54 hours (2.25 days) with a median of 5.46 hours, indicating rapid takedown of many sites while a subset remains active longer. Interestingly, logistic-themed phishing websites (e.g., USPS) operate within a compressed timeframe (1.76 hours) compared to other brands (e.g., Facebook). We further analyze detection effectiveness using Google Safe Browsing (GSB). We find that GSB detects only 18.4% of phishing websites, taking an average of 4.5 days. Notably, 83.93% of phishing sites are already taken down before GSB detection, meaning GSB requires more prompt detection. Moreover, 16.07% of phishing sites persist beyond this point, surviving for an additional 7.2 days on average, resulting in an average total lifespan of approximately 12 days. We reveal that DNS resolution error is the main cause (67%) of phishing website takedowns. Finally, we uncover that phishing sites with extensive visual changes (more than 100 times) exhibit a median lifespan of 17 days, compared to 1.93 hours for those with minimal modifications. These results highlight the dynamic nature of phishing attacks, the challenges in detection and prevention, and the need for more rapid and comprehensive countermeasures against evolving phishing tactics.

 

Notes

This dataset contains 3 months of the dataset.

Due to dataset size limitations, please request access to the full dataset at: https://moa-lab.net/security-configurations-measurement/

Files

Files (40.2 GB)

Name Size Download all
md5:b0e36806a6b8aab4a3e49600f89fc22b
40.2 GB Download

Additional details

Dates

Collected
2021-07-15
Data collection start date
Collected
2024-01-31
Data collection end date