Zero-Shot CoDA Darkweb Documents Classification with OpenAI LLMs Results
Authors/Creators
Description
Overview:
Plase cite this paper: Domínguez-Díaz, A., de-Marcos, L., Prado-Sánchez, V.-P., Rodriguez, D., & Martínez-Herráiz, J.-J. (2026). Classifying illicit dark web content through zero-shot prompting: An empirical study with GPT models. Information Processing & Management, 63(2, Part B), 104476. https://doi.org/10.1016/j.ipm.2025.104476
This repository contains the prompt used to classify CoDA (Comprehensive Darkweb Annotations)dataset documents with OpenAI models, as well as xlsx files containing the results of classification and stability tests described in the paper for each model (GPT-4.1, GPT-4.1 mini, GPT-4.1 nano and o4-mini). The CoDA dataset is a publicly available collection of 10,000 multilingual Dark Web documents introduced by Jin et al. (2022).
References:
Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412
Files
openai-zeroshot-coda-classification-results.zip
Files
(997.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:69a3ffbf45624334223c481552c06578
|
997.4 kB | Preview Download |
Additional details
Related works
- Is described by
- Journal article: 10.1016/j.ipm.2025.104476 (DOI)
Funding
- Ministerio de Ciencia, Innovación y Universidades
- Proyecto para el análisis y recuperación de evidencias criminales asociadas a redes ocultas (PARCHE) PID2021-125645OB-I00
References
- Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412