Overview:

Plase cite this paper: Domínguez-Díaz, A., de-Marcos, L., Prado-Sánchez, V.-P., Rodriguez, D., & Martínez-Herráiz, J.-J. (2026). Classifying illicit dark web content through zero-shot prompting: An empirical study with GPT models. Information Processing & Management, 63(2, Part B), 104476. https://doi.org/10.1016/j.ipm.2025.104476

This repository contains the prompt used to classify CoDA (Comprehensive Darkweb Annotations)dataset documents with OpenAI models, as well as xlsx files containing the results of classification and stability tests described in the paper for each model (GPT-4.1, GPT-4.1 mini, GPT-4.1 nano and o4-mini). The CoDA dataset is a publicly available collection of 10,000 multilingual Dark Web documents introduced by Jin et al. (2022).

References

Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412