Published May 5, 2016 | Version v1
Dataset Open

PAN16 Author Identification: Clustering

Description

We provide a collection of (up to 100) documents to identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given.

More information: Link

Files

pan16-author-clustering-test-and-training.zip

Files (5.3 MB)

Name Size Download all
md5:711e95fed2a865a82faffcf77475c3e9
5.3 MB Preview Download

Additional details

References

  • Efstathios Stamatatos, Michael Tschuggnall, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. Clustering by Authorship Within and Across Documents. In Working Notes Papers of the CLEF 2016 Evaluation Labs volume 1609 of CEUR Workshop Proceedings, September 2016. CEUR-WS.org. ISSN 1613-0073.