Published December 9, 2025 | Version v1
Dataset Open

PoPoC

Authors/Creators

  • 1. ROR icon KTH Royal Institute of Technology

Description

Smart contract security is a critical concern in the blockchain ecosystem, as vulnerabilities have resulted in billions of dollars in financial losses. This urgency has driven the development of numerous automated security tools; however, their effectiveness is tightly linked to the data on which they are trained and evaluated. In current research practice, datasets vary widely in structure, provenance, and quality, as they are often manually assembled from various sources to satisfy the specific needs of individual studies. Because obtaining verified, real-world vulnerabilities and exploits is challenging, many researchers supplement or replace real data with artificially injected or otherwise synthetic examples. These practices, collectively, lead to evaluation settings that do not fully capture the complexity, diversity, and exploitability of vulnerabilities found in contracts intended for real use. As a result, tool performance is frequently overestimated in academic benchmarks, contributing to a persistent gap between reported results and the practical needs of auditors and developers.

This thesis addresses this gap by introducing PoPoC, a novel benchmark dataset built from real-world, verified Proof-of-Concept (PoC) exploits. We present a reproducible workflow for creating this dataset, which begins by scraping 4,770 audit reports from the Solodit platform, filtering for 1,053 reports that contain dedicated PoC sections. These candidates are then automatically enriched with quality metrics using a Large Language Model (LLM) and ranked via a custom priority-scoring heuristic.

The core of this work involved a rigorous manual validation of the top 100 ranked audits. This process resulted in a curated dataset of 58 fully reproduced, executable exploits, with each reproduction packaged within a containerized environment to ensure reliability. Our analysis confirms that 100% of the entries in the PoPoC dataset are technically correct. However, we also found that the original PoCs often have inconsistent test oracle coverage (with only 28 of 58 having complete assertions) and that the dataset shows limited platform diversity, being sourced primarily from Code4Arena.

The primary contributions of this thesis is the reproducible method to extract and validate exploit PoCs from raw audits, the curated PoPoC dataset, and its accompanying codebase containing PoC reproductions and the vulnerable source code. This work provides the first benchmark to systematically connect formal vulnerability descriptions and vulnerable source code with manually verified, proven, and runnable PoC exploits. PoPoC serves as a high-quality, reproducible foundation for benchmarking security tools, training auditors, and advancing future research in automated vulnerability detection.

Files

PoPoC.zip

Files (5.5 GB)

Name Size Download all
md5:4999de682c1a74728a7d00bb21d2e80a
5.5 GB Preview Download