Published February 27, 2024 | Version v1
Dataset Open

PAN24 Voight-Kampff Generative AI Authorship Verification

  • 1. Bauhaus-Universität Weimar
  • 2. ROR icon University of the Aegean

Description

This is the dataset for the shared task on Voight-Kampff Generative AI Authorship Verification PAN@CLEF2024. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.

Task

With Large Language Models (LLMs) improving at breakneck speed and seeing more widespread adoption every day, it is getting increasingly hard to discern whether a given text was authored by a human being or a machine. Many classification approaches have been devised to help humans distinguish between human- and machine-authored text, though often without questioning the fundamental and inherent feasibility of the task itself.

With years of experience in a related but much broader field—authorship verification—, we set out to answer whether this task can be solved. We start with the simplest arrangement of a suitable task setup: Given two texts, one authored by a human, one by a machine: pick out the human.

The Generative AI Authorship Verification Task @ PAN is organized in collaboration with the Voight-Kampff Task @ ELOQUENT Lab in a builder-breaker style. PAN participants will build systems to tell human and machine apart, while ELOQUENT participants will investigate novel text generation and obfuscation methods for avoiding detection.

Files

pan24-generative-authorship-news.zip

Files (12.4 MB)

Name Size Download all
md5:47e17f58fd3509a4c649119ada3ae78e
12.4 MB Preview Download

Additional details