Dataset Restricted Access
REYNIER ORTEGA BUENO; BERTA CHULVI; FRANCISCO RANGEL; PAOLO ROSSO; ELISABETTA FERSINI
TASK
With irony, language is employed in a figurative and subtle way to mean the opposite to what is literally stated. In case of sarcasm, a more aggressive type of irony, the intent is to mock or scorn a victim without excluding the possibility to hurt. Stereotypes are often used, especially in discussions about controversial issues such as immigration or sexism and misogyny. At PAN’22, we will focus on profiling ironic authors in Twitter. Special emphasis will be given to those authors that employ irony to spread stereotypes, for instance, towards women or the LGTB community. The goal will be to classify authors as ironic or not depending on their number of tweets with ironic content. Among those authors we will consider a subset that employs irony to convey stereotypes in order to investigate if state-of-the-art models are able to distinguish also these cases. Therefore, given authors of Twitter together with their tweets, the goal will be to profile those authors that can be considered as ironic.
DATA
Input
The uncompressed dataset consists in a folder which contains:
The format of the XML files is:
<author lang="en">
<documents>
<document>Tweet 1 textual contents</document>
<document>Tweet 2 textual contents</document>
...
</documents>
</author>
The format of the truth.txt file is as follows. The first column corresponds to the author id. The second column contains the truth label.
2d0d4d7064787300c111033e1d2270cc:::I
b9eccce7b46cc0b951f6983cc06ebb8:::NI
f41251b3d64d13ae244dc49d8886cf07:::I
47c980972060055d7f5495a5ba3428dc:::NI
d8ed8de45b73bbcf426cdc9209e4bfbc:::I
2746a9bf36400367b63c925886bc0683:::NI
...
Evaluation
The performance of your system will be ranked by accuracy.
More info on the task: https://pan.webis.de/clef22/pan22-web/author-profiling.html
You may request access to the files in this upload, provided that you fulfil the conditions below. The decision whether to grant/deny access is solely under the responsibility of the record owner.
Please request access to the data with a short statement on how you want to use it.
The use of the data is limited to research purposes.
Please use the following to reference the data:
NOT AVAILABLE YET
Regarding anonymization, we recommend reading the following paper:
Rangel, F., & Rosso, P. (2019). On the Implications of the General Data Protection Regulation on the Organisation of Evaluation Tasks. Language and Law (Linguagem e Direito), 5:(2), 95-117.
We would like to point out that you can register on pan.webis.de to be part of the PAN community.
All versions | This version | |
---|---|---|
Views | 2,349 | 972 |
Downloads | 198 | 88 |
Data volume | 939.6 MB | 315.3 MB |
Unique views | 1,553 | 652 |
Unique downloads | 146 | 57 |