PAN23 Multi-Author Writing Style Analysis
- 1. University of Innsbruck
- 2. Leipzig University
- 3. Bauhaus-Universität Weimar
Description
This is the dataset for the shared task on Multi-Author Writing Style Analysis PAN@CLEF2023. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.
Task: We ask participants to solve the following intrinsic style change detection task: for a given text, find all positions of writing style change on the paragraph-level (i.e., for each pair of consecutive paragraphs, assess whether there was a style change). The simultaneous change of authorship and topic will be carefully controlled and we will provide participants with datasets of three difficulty levels:
- Easy: The paragraphs of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes.
- Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to effectively solve the detection task.
- Hard: All paragraphs in a document are on the same topic.
All documents are provided in English and may contain an arbitrary number of style changes. However, style changes may only occur between paragraphs (i.e., a single paragraph is always authored by a single author and contains no style changes).
Data: To develop and then test your algorithms, three datasets including ground truth information are provided (dataset1 for the easy task, dataset2 for the medium task, and dataset3 for the hard task).
Each dataset is split into three parts:
- training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models.
- validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models.
- test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation.
You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license.
Versioning:
- 1.0: initial upload
Files
pan23-multi-author-analysis.zip
Files
(26.1 MB)
Name | Size | Download all |
---|---|---|
md5:3af611691569c82891b6fc7a53ad04f2
|
26.1 MB | Preview Download |