Published May 5, 2016 | Version v1
Dataset Open

PAN16 Author Identification: Clustering


We provide a collection of (up to 100) documents to identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given.

More information: Link


Files (5.3 MB)

Name Size Download all
5.3 MB Preview Download

Additional details


  • Efstathios Stamatatos, Michael Tschuggnall, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. Clustering by Authorship Within and Across Documents. In Working Notes Papers of the CLEF 2016 Evaluation Labs volume 1609 of CEUR Workshop Proceedings, September 2016. ISSN 1613-0073.