Published April 24, 2026 | Version 1.0
Dataset Open

Tor Darkmarket Ecosystem Network Analysis

Description


This repository contains the data and code used in the studies cited bellow. The study employs network analysis to explore the structure, connectivity, and vulnerabilities of the Tor darkmarket ecosystem, focusing on the interplay of topics, communication channels, and languages.

Repository Contents

Data:
bipartite_tor_network.graphml: A GraphML file representing the bipartite graph of 82,285 Tor onion services and 57,071 identification forms (IDs) with 248,971 edges. Nodes include onion services (categorized by topic) and IDs (e.g., email, Telegram, cryptocurrency wallets). Edges represent connections where an ID is referenced by an onion service.


Node Attributes:
type: Indicates whether the node is an onion_service or ID.
topic: For onion services, one of hacking, finance-crypto, search-engine-index, other, drugs-narcotics, electronics, or finance. For IDs, the topic of the connected onion service(s).
id_type: For ID nodes, one of email, Telegram, Pastebin, PGP, phone, Bitcoin Wallet, Monero Wallet, Discord, Skype, DASH Wallet, Zcash Wallet, or Binance Coin Wallet.
language: Primary language of the onion service (e.g., Russian, English, Portuguese, etc.) or Unknown if not determinable.

Edge Attributes:
Connects an onion service to an ID when the ID is referenced in the service.


Data Collection:
The dataset was collected from Tor darkmarket domains over a 20-week period (July 2024 to November 2024) using web scraping tools provided by ByronLabs. Onion services were categorized into topics using the MISP Dark Web taxonomy, and IDs were extracted using custom regular expressions and the Restalker package. Languages were identified using the langdetect package. Read papers cited bellow for further detalis


Notes:
(1) The dataset has been anonymized to comply with ethical guidelines, with specific ID names (e.g., email addresses, Telegram usernames) replaced with placeholders (e.g., ID1, ID2).
(2) The bipartite_tor_network.graphml file is available at https://gitlab.com/luis.demarcos/bipartitetordomainsids.
(3) Due to the sensitive nature of the data, raw scraped content (e.g., original onion service text) is not included to protect privacy and prevent misuse.
(4) The code assumes familiarity with Python and network analysis concepts. Refer to the manuscript for detailed methodology and interpretation of results.


Citation: If you use this data or code in your research, please cite the following papers:
(1) de-Marcos, L., Domínguez-Díaz, A., Junquera-Sánchez, J., Cilleruelo, C., & Martínez-Herráiz, J.-J. (2025). Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities. Information, 16(11), 924. https://doi.org/10.3390/info16110924
(2) de-Marcos, L., Domínguez-Díaz, A., & Stapic, Z. (2026). Mapping the Tor Darkmarket Ecosystem: A Network Analysis of Topics, Communication Channels, and Languages. Forensic Science International: Digital Investigation. 56. https://doi.org/10.1016/j.fsidi.2025.302032


Contact
For questions or issues, contact:
Luis de-Marcos: luis.demarcos@uah.es


License: This repository is licensed under the MIT License. See the LICENSE file for details

Files

bipartitetordomainsids-main.zip

Files (41.9 MB)

Name Size Download all
md5:186039fb047f2f577d74f9015ad59dc0
41.9 MB Preview Download

Additional details

Related works

Is described by
Journal article: 10.3390/info16110924 (DOI)

Funding

Ministerio de Ciencia, Innovación y Universidades
PID2021-125645OB-I00

Software