Published April 24, 2026 | Version 1.0
Dataset Open

Zero-Shot CoDA Darkweb Documents Classification with OpenAI LLMs Results

Description

Overview:

Plase cite this paper: Domínguez-Díaz, A., de-Marcos, L., Prado-Sánchez, V.-P., Rodriguez, D., & Martínez-Herráiz, J.-J. (2026). Classifying illicit dark web content through zero-shot prompting: An empirical study with GPT models. Information Processing & Management, 63(2, Part B), 104476. https://doi.org/10.1016/j.ipm.2025.104476

This repository contains the prompt used to classify CoDA (Comprehensive Darkweb Annotations)dataset documents with OpenAI models, as well as xlsx files containing the results of classification and stability tests described in the paper for each model (GPT-4.1, GPT-4.1 mini, GPT-4.1 nano and o4-mini). The CoDA dataset is a publicly available collection of 10,000 multilingual Dark Web documents introduced by Jin et al. (2022).

References:

Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412

Files

openai-zeroshot-coda-classification-results.zip

Files (997.4 kB)

Name Size Download all
md5:69a3ffbf45624334223c481552c06578
997.4 kB Preview Download

Additional details

Related works

Is described by
Journal article: 10.1016/j.ipm.2025.104476 (DOI)

Funding

Ministerio de Ciencia, Innovación y Universidades
Proyecto para el análisis y recuperación de evidencias criminales asociadas a redes ocultas (PARCHE) PID2021-125645OB-I00

References

  • Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412