PAN'25 Generative AI Detection (Task 2): Human-AI Collaborative Text Classification
Creators
- 1. Mohamed bin Zayed University of Artificial Intelligence, UAE
- 2. Nebius AI, Netherlands
- 3. New York University Abu Dhabi, UAE
- 4. Toloka AI, Netherlands
- 5. Cornell University, USA
- 6. Technical University of Darmstadt, Germany
Description
Dataset for the Generative AI Detection Task (Subtask 2) @ PAN 2025.
As large language models (LLMs) like GPT-4o, Claude 3.5, and Gemini 1.5-pro become increasingly accessible, machine-generated content is proliferating across diverse domains, including news, social media, education, and academia. These models produce highly fluent and coherent text, making them valuable for automating various writing tasks. However, their widespread use also raises concerns about misinformation, academic integrity, and content authenticity. Identifying the degree of human and machine involvement in text creation is crucial for addressing these challenges.
In this shared task, we focus on Human-AI Collaborative Text Classification, where the goal is to categorize documents that have been co-authored by humans and LLMs. Specifically, we aim to classify texts into six distinct categories based on the nature of human and machine contributions:
- Fully human-written: The document is entirely authored by a human without any AI assistance.
- Human-initiated, then machine-continued: A human starts writing, and an AI model completes the text.
- Human-written, then machine-polished: The text is initially written by a human but later refined or edited by an AI model.
- Machine-written, then machine-humanized (obfuscated): An AI generates the text, which is later modified to obscure its machine origin.
- Machine-written, then human-edited: The content is generated by an AI but subsequently edited or refined by a human.
- Deeply-mixed text: The document contains interwoven sections written by both humans and AI, without a clear separation.
Label Distribution:
Label Category | Train | Dev |
---|---|---|
Machine-written, then machine-humanized | 91,232 | 10,137 |
Human-written, then machine-polished | 95,398 | 12,289 |
Fully human-written | 75,270 | 12,330 |
Human-initiated, then machine-continued | 10,740 | 37,170 |
Deeply-mixed text (human + machine parts) | 14,910 | 225 |
Machine-written, then human-edited | 1,368 | 510 |
Total | 288,918 | 72,661 |
Files
pan25-generative-ai-detection-task2-train.zip
Files
(282.3 MB)
Name | Size | Download all |
---|---|---|
md5:969665003148bdaae956f9e1cc9165ae
|
282.3 MB | Preview Download |