Dataset Restricted Access

PAN17 Author Identification: Clustering

Potthast, Martin; Rangel, Francisco; Tschuggnall, Michael; Stamatatos, Efstathios; Rosso, Paolo; Stein, Benno

We provide a collection of (up to 50) short documents (paragraphs extracted from larger documents), identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given.

More information: Link

Restricted Access

You may request access to the files in this upload, provided that you fulfil the conditions below. The decision whether to grant/deny access is solely under the responsibility of the record owner.

Please request access to the data with a short statement on how you want to use it. Thanks!
We would like to point out that you can register on to be part of the PAN community.

  • Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. Overview of PAN 2017: Author Identification, Author Profiling, and Author Obfuscation. In Gareth J. F. Jones et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 2017), Berlin Heidelberg New York, September 2017. Springer.

All versions This version
Views 254254
Downloads 1818
Data volume 17.3 MB17.3 MB
Unique views 208208
Unique downloads 1717


Cite as