Published April 24, 2026
| Version 1.0
Dataset
Open
Tor Darkmarket Ecosystem Network Analysis
Authors/Creators
Description
This repository contains the data and code used in the studies cited bellow. The study employs network analysis to explore the structure, connectivity, and vulnerabilities of the Tor darkmarket ecosystem, focusing on the interplay of topics, communication channels, and languages.
Repository Contents
Data:
bipartite_tor_network.graphml: A GraphML file representing the bipartite graph of 82,285 Tor onion services and 57,071 identification forms (IDs) with 248,971 edges. Nodes include onion services (categorized by topic) and IDs (e.g., email, Telegram, cryptocurrency wallets). Edges represent connections where an ID is referenced by an onion service.
Node Attributes:
type: Indicates whether the node is an onion_service or ID.
topic: For onion services, one of hacking, finance-crypto, search-engine-index, other, drugs-narcotics, electronics, or finance. For IDs, the topic of the connected onion service(s).
id_type: For ID nodes, one of email, Telegram, Pastebin, PGP, phone, Bitcoin Wallet, Monero Wallet, Discord, Skype, DASH Wallet, Zcash Wallet, or Binance Coin Wallet.
language: Primary language of the onion service (e.g., Russian, English, Portuguese, etc.) or Unknown if not determinable.
Edge Attributes:
Connects an onion service to an ID when the ID is referenced in the service.
Data Collection:
The dataset was collected from Tor darkmarket domains over a 20-week period (July 2024 to November 2024) using web scraping tools provided by ByronLabs. Onion services were categorized into topics using the MISP Dark Web taxonomy, and IDs were extracted using custom regular expressions and the Restalker package. Languages were identified using the langdetect package. Read papers cited bellow for further detalis
Notes:
(1) The dataset has been anonymized to comply with ethical guidelines, with specific ID names (e.g., email addresses, Telegram usernames) replaced with placeholders (e.g., ID1, ID2).
(2) The bipartite_tor_network.graphml file is available at https://gitlab.com/luis.demarcos/bipartitetordomainsids.
(3) Due to the sensitive nature of the data, raw scraped content (e.g., original onion service text) is not included to protect privacy and prevent misuse.
(4) The code assumes familiarity with Python and network analysis concepts. Refer to the manuscript for detailed methodology and interpretation of results.
Citation: If you use this data or code in your research, please cite the following papers:
(1) de-Marcos, L., Domínguez-Díaz, A., Junquera-Sánchez, J., Cilleruelo, C., & Martínez-Herráiz, J.-J. (2025). Unveiling Dark Web Identity Patterns: A Network-Based Analysis of Identification Types and Communication Channels in Illicit Activities. Information, 16(11), 924. https://doi.org/10.3390/info16110924
(2) de-Marcos, L., Domínguez-Díaz, A., & Stapic, Z. (2026). Mapping the Tor Darkmarket Ecosystem: A Network Analysis of Topics, Communication Channels, and Languages. Forensic Science International: Digital Investigation. 56. https://doi.org/10.1016/j.fsidi.2025.302032
Contact
For questions or issues, contact:
Luis de-Marcos: luis.demarcos@uah.es
License: This repository is licensed under the MIT License. See the LICENSE file for details
Files
bipartitetordomainsids-main.zip
Files
(41.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:186039fb047f2f577d74f9015ad59dc0
|
41.9 MB | Preview Download |
Additional details
Related works
- Is described by
- Journal article: 10.3390/info16110924 (DOI)
Funding
- Ministerio de Ciencia, Innovación y Universidades
- PID2021-125645OB-I00
Software
- Repository URL
- https://gitlab.com/luis.demarcos/bipartitetordomainsids
- Development Status
- Inactive