Zero-Shot CoDA Darkweb Documents Classification with OpenAI LLMs Results

Domínguez-Díaz, Adrián; de-Marcos, Luis; Prado-Sanchez, Víctor-Pablo; Martínez-Herráiz, José-Javier; Rodriguez, Daniel

doi:10.5281/zenodo.19728036

Published April 24, 2026 | Version 1.0

Dataset Open

Zero-Shot CoDA Darkweb Documents Classification with OpenAI LLMs Results

1. Universidad de Alcalá

Overview:

Plase cite this paper: Domínguez-Díaz, A., de-Marcos, L., Prado-Sánchez, V.-P., Rodriguez, D., & Martínez-Herráiz, J.-J. (2026). Classifying illicit dark web content through zero-shot prompting: An empirical study with GPT models. Information Processing & Management, 63(2, Part B), 104476. https://doi.org/10.1016/j.ipm.2025.104476

This repository contains the prompt used to classify CoDA (Comprehensive Darkweb Annotations)dataset documents with OpenAI models, as well as xlsx files containing the results of classification and stability tests described in the paper for each model (GPT-4.1, GPT-4.1 mini, GPT-4.1 nano and o4-mini). The CoDA dataset is a publicly available collection of 10,000 multilingual Dark Web documents introduced by Jin et al. (2022).

References:

Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412

Files

openai-zeroshot-coda-classification-results.zip

Files (997.4 kB)

Name	Size	Download all
openai-zeroshot-coda-classification-results.zip md5:69a3ffbf45624334223c481552c06578	997.4 kB	Preview Download

Additional details

Is described by: Journal article: 10.1016/j.ipm.2025.104476 (DOI)

Ministerio de Ciencia, Innovación y Universidades
Proyecto para el análisis y recuperación de evidencias criminales asociadas a redes ocultas (PARCHE) PID2021-125645OB-I00

Jin, Y., Jang, E., Lee, Y., Shin, S., & Chung, J.-W. (2022). Shedding new light on the language of the dark web. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 5621–5637). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.412

	All versions	This version
Views	48	48
Downloads	11	11
Data volume	12.0 MB	12.0 MB

openai-zeroshot-coda-classification-results.zip

Files (997.4 kB)

Related works

Funding

References

Zero-Shot CoDA Darkweb Documents Classification with OpenAI LLMs Results

Authors/Creators

Description

Files

openai-zeroshot-coda-classification-results.zip

Files (997.4 kB)

Additional details

Related works

Funding

References