Published September 11, 2017 | Version v1
Dataset Open

PAN17 Author Identification: Clustering

  • 1. Universität Leipzig
  • 2. Bauhaus-Universität Weimar

Description

We provide a collection of (up to 50) short documents (paragraphs extracted from larger documents), identify authorship links and groups of documents by the same author. All documents are single-authored, in the same language, and belong to the same genre. However, the topic or text-length of documents may vary. The number of distinct authors whose documents are included in the collection is not given.

More information: Link

Files

pan17-author-clustering-test-and-training.zip

Files (961.4 kB)

Name Size Download all
md5:bf825e50ccd9581d72ae09345bd4de65
961.4 kB Preview Download

Additional details

References

  • Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. Overview of PAN 2017: Author Identification, Author Profiling, and Author Obfuscation. In Gareth J. F. Jones et al, editors, Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 2017), Berlin Heidelberg New York, September 2017. Springer.