Published December 29, 2025 | Version v1
Dataset Open

Social Graph Inference Reddit

  • 1. ROR icon Jožef Stefan Institute

Description

we construct a large-scale, empirically grounded dataset from Reddit to support the development and evaluation of agent-based social simulations. The dataset includes 33, technology-focused, 14 climate-focused, and 7 COVID-related agents, each domain encompassing over (one million posts and comments ). Using publicly available posts and comments, we define agent categories based on content and interaction patterns, derive inter-agent relationships from temporal commenting behaviors, and build a directed, weighted network that reflects empirically observed user connections. The resulting dataset enables researchers to calibrate and benchmark agent behavior, network structure, and information diffusion processes against real social dynamics. Quantitative and qualitative analysis reveal distinctive patterns in user connectivity, engagement life cycles, and triadic closure growth, illustrating the potential of Reddit-derived interaction networks for realistic social simulation.

Files

climate_14_agents.json

Files (1.4 GB)

Name Size Download all
md5:6b116a3d62f121ead559679fe138532f
443.1 MB Preview Download
md5:daa3b5271e27ece9206cda543c4e6c87
105.8 kB Preview Download
md5:f602692d48f933d0f71078757bdc887f
490.7 MB Preview Download
md5:3da83f5a45e493725e4c1a5125e39258
12.3 kB Preview Download
md5:cb6cb769d2479bdc3207896959f27410
67.0 kB Preview Download
md5:5304041f3302208c11009a7b9581cb2f
502.0 MB Preview Download

Additional details

Funding

European Commission
TWON - TWin of Online Social Networks 101095095

Software

Repository URL
https://github.com/abdulsittar/Social-Graph-Inference-Reddit
Programming language
Python
Development Status
Active