Published September 11, 2017 | Version v1
Dataset Open

PAN17 Author Identification: Clustering

  • 1. Universität Leipzig
  • 2. Bauhaus-Universität Weimar


We provide a collection of (up to 50) short documents (paragraphs extracted from larger documents), identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given.

More information: Link


Files (961.4 kB)

Name Size Download all
961.4 kB Preview Download

Additional details


  • Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. Overview of PAN 2017: Author Identification, Author Profiling, and Author Obfuscation. In Gareth J. F. Jones et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 2017), Berlin Heidelberg New York, September 2017. Springer.