Published August 14, 2025 | Version v2
Dataset Open

A Decade-long Landscape of Advanced Persistent Threats: Longitudinal Analysis and Global Trends

  • 1. ROR icon Sungkyunkwan University
  • 2. EDMO icon University of Tennessee Knoxville
  • 3. ROR icon Stony Brook University

Description

A Decade-long Landscape of Advanced Persistent Threats: Longitudinal Analysis and Global Trends

This repository accompanies the paper "A Decade-long Landscape of Advanced Persistent Threats: Longitudinal Analysis and Global Trends", published in the Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security (CCS '25).

It provides:
  • Curated datasets from the longitudinal study of Advanced Persistent Threat (APT) campaigns across the last decade
  • Visual representations of APT campaigns, including an interactive map and a flow diagram showing relationships between threat actors and target countries  
  • Python code to generate the figures included in the paper for full reproducibility  
_______________________________________________________________________________________________________________

Dataset Overview

The repository contains the following collections:

Threat Actor Collection

Aggregates APT threat actor (TA) information from three curated open-source repositories (TA#1–TA#3):

Each record provides:
  • "Threat Actor": Unique identifier
  • "Other Names": Known aliases
  • "Country": Attributed country of origin
  • "Sponsor": Sponsoring entity
  • Motivation
  • "First seen": First recorded year of activity

Technical Report Collection

Consolidates metadata on APT technical reports from three open-source repositories (TR#1–TR#3):
Metadata fields include:
  • "Date": Publication date
  • Filename
  • Title
  • "Download Url": Source download link

Information Retrieved Collection

Refined dataset resulting from the extraction and validation of structured information from the reports.
Information was obtained through a combination of rule-based, LLM-based, and manual methods:

The final dataset after information retrieval and refinement of generated answers.
The information was retrieved using rule-based (i.e. IoCParser), LLM-based (i.e. GPT-4-Turbo), and manual retrievals.

Rule-based Retrieval
IoCParser, a tool designed for processing IoCs from various data sources, was chosen.
It focuses on extracting:
  • CVE identifiers  
  • MITRE ATT&CK technique IDs  
  • YARA rules
LLM-based Retrieval
After comparative evaluation, GPT-4-Turbo was selected for its highest performance across precision, recall, and F1 score metrics. The model was used to extract:
  • Threat actor attribution  
  • Victim country  
  • Use of zero-day exploits  
  • Initial attack vectors  
  • Malware names  
  • Targeted sectors  
  • Campaign duration
Manual Verification
Due to the persistent and stealthy nature of APT campaigns, LLM-derived information on attack durations was manually reviewed and validated for accuracy.

Visual Representations

This repository provides interactive visualizations that complement the findings in the paper:
  • Interactive APT Map
    • A map enabling exploration of APT campaigns by selecting either an attacking or victim country. It presents decade-long historical data including threat actor(s), CVEs, attack vector(s), malware, target sector(s), and estimated duration. Data is dynamically updated using LLM-based retrieval from TR#1. It also integrates a timeline chart linking campaigns to relevant news articles for additional context.
  • Threat Actor - Victim Country Flow Diagram
    • An interactive Sankey-style diagram visualizing the relationships between the top 10 threat actors and the 30 most frequently targeted countries over the past decade.

Global Trends

The Global Trends directory contains Python scripts for generating the figures presented in the paper.  
Each script reads from the curated datasets in this repository and outputs a figure in PDF format using the same visual style and parameters as in the published paper.

Usage

All figure drawing scripts require Python 3.8+ and the following Python packages:
pip install pandas numpy altair vl-convert-python seaborn matplotlib
After installing the dependencies, you can run a script with:
cd "Global Trends"
python {desiredCode}.py
Running a script will produce a PDF file in the current directory.

Font Configuration

The figures in the paper use specific fonts.  
If they are missing, Matplotlib will show "findfont" warnings and fall back to its default fonts. The scripts will still run correctly, and figures will be generated.
To suppress the warnings and use the intended fonts on Linux systems, run:
sudo apt install msttcorefonts -qq
rm -rf ~/.cache/matplotlib
On Windows and macOS, installing these fonts can be more complex, and is optional.

Files

Global Trends.zip

Files (1.5 MB)

Name Size Download all
md5:076d202974bf328d0edade3b0772e0f0
12.4 kB Preview Download
md5:8bf1ee7d029739421d5ad345e3dedc0e
1.1 MB Preview Download
md5:a1f4eae4d178e590af2e4768f38a10f3
331.6 kB Preview Download
md5:08201784b70e3043ecfba11edcd49d51
44.8 kB Preview Download