Published April 29, 2025 | Version v1
Dataset Open

Drug trade messages on Tor in Finnish, English and Polish

  • 1. ROR icon University of Helsinki
  • 2. ROR icon University of Eastern Finland
  • 3. ROR icon Kazimierz Wielki University in Bydgoszcz
  • 4. ROR icon Seinäjoki University of Applied Sciences
  • 5. ROR icon University of Edinburgh
  • 6. ROR icon Tampere University

Description

Within the Tor network, we compared the Finnish, English and Polish drug trade. To accomplish this, we selected three active onion websites. We compile and publish these three datasets, freely available to the academic community.

1. A Finnish website within the Tor network offers an anonymous chat messenger and chat rooms for the biggest towns in Finland, where most of the messages are associated with the illicit drug trade. Finnish sample / Finnish.zip: 1,500 messages from the Finnish onion website “Tsatti”. We collected these in November and December of 2022. Anonymous chat Tsatti—Finnish discussions on Tor: http://tsattickdplsh2i2xqzlybvreiuppgoqsicmzkrotuudnk7h665ukgid.onion/

2. The Polish forum website “Cebulka” operates anonymously within Tor and facilitates drug trafficking. Polish sample / Polish.zip: 965 advertisement pages from the “Cebulka” onion website in Polish. We collected these pages in January 2023. Anonymous Forum Cebulka: http://cebulka7uxchnbpvmqapg5pfos4ngaxglsktzvha7a5rigndghvadeyd.onion/

3. Established in 2021, “Nemesis” is a combination of a darknet market and a forum; in addition, the products are displayed publicly, and registration is not necessary. English sample / English.zip: “Nemesis” Market offers illegal products. Data consists of 1,334 pages. We collected these pages in January 2023. Anonymous Marketplace Nemesis Market: http://nemesis555nchzn2dogee6mlc7xxgeeshqirmh3yzn4lo5cnd4s5a4yd.onion/

Language and sellers Unique sellers in the sample Selling Advertisements
English 223 1334
Polish >200 965
Finnish >200 >500
Language and contact methods Email Wickr Session Telegram
English 20 2 0 13
Polish 1864 698 15 91
Finnish 0 588 35 9

The datasets are under CC BY 4.0 license: You are free to copy, share, redistribute, remix, transform, and build upon the material for any purpose, even commercially. You must give attribution and appropriate credit. Uncompressed data contains a “README.md” file, all of the web pages in a JSON file in a machine-readable format, screenshots from the websites and a “textpages” folder contains readable text versions of the web pages.

Data collector and curation: Juha Nurmi

Funding: This work was supported by the European Commission under the Horizon Europe funding programme, as part of the project SafeHorizon (Grant Agreement 101168562). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European. Neither the European Union nor the granting authority can be held responsible for them. The authors declare no competing interests.

Files

English.zip

Files (30.6 MB)

Name Size Download all
md5:94cfa0a7d2ccd2ffd6af6b3f213e134d
9.7 MB Preview Download
md5:7fe92ea69b1500bbd6a9c757ebc036a1
1.6 MB Preview Download
md5:b40a9248a39f65ce001e0ab8dae74aa8
19.2 MB Preview Download

Additional details

Funding

European Commission
SafeHorizon - Innovations in Detecting and Disrupting Crime-as-a-Service Operations 101168562
National Science Centre
Rhizomatic networks, circulation of meanings and contents, and offline contexts of online drug trade 2021/43/B/HS6/00710

Dates

Collected
2022-12-01
Data collection started
Collected
2023-01-31
Data collection ended