Disco-Annotation
Creators
- 1. Idiap Research Institute
- 2. University of Geneva
- 3. Université Catholique de Louvain
Description
Description
Disco-Annotation is a collection of training and test sets with manually annotated discourse relations for 8 discourse connectives in europarl texts.
The 8 connectives with their annotated relations are:
- although (contrast|concession)
- as (prep|causal|temporal|comparison|concession)
- however (contrast|concession)
- meanwhile (contrast|temporal)
- since (causal|temporal|temporal-causal)
- though (contrast|concession)
- while (contrast|concession|temporal|temporal-contrast|temporal-causal)
- yet (adv|contrast|concession)
For each connective there is a training set and a test set. The relations were annotated by two trained annotators with a translation spotting method. The division into training and test also allows for comparison reasons if you train your own models.
If you need software for the latter, have a look at: https://github.com/idiap/DiscoConn-Classifier
Citation
Please cite the following papers if you make use of these datasets (and to know more about the annotation method):
@INPROCEEDINGS{Popescu-Belis-LREC-2012,
author = {Popescu-Belis, Andrei and Meyer, Thomas and Liyanapathirana, Jeevanthi and Cartoni, Bruno and Zufferey, Sandrine},
title = {{D}iscourse-level {A}nnotation over {E}uroparl for {M}achine {T}ranslation:
{C}onnectives and {P}ronouns},
booktitle = {Proceedings of the eighth international conference on Language Resources and Evaluation ({LREC})},
year = {2012},
address = {Istanbul, Turkey}
}
@Article{Cartoni-DD-2013,
Author = {Cartoni, Bruno and Zufferey, Sandrine and Meyer, Thomas},
Title = {{Annotating the meaning of discourse connectives by looking at their translation: The translation-spotting technique}},
Journal = {Dialogue \& Discourse},
Volume = {4},
Number = {2},
pages = {65--86},
year = {2013}
}
@ARTICLE{Meyer-TSLP-submitted,
author = {Meyer, Thomas and Hajlaoui, Najeh and Popescu-Belis, Andrei},
title = {{Disambiguating Discourse Connectives for Statistical Machine Translation in Several Languages}},
journal = {IEEE/ACM Transactions of Audio, Speech, and Language Processing},
year = {submitted},
volume = {},
pages = {},
number = {}
}
Files
MD5SUM.TXT
Files
(203.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c0f89ff312dc0a9c9bab688211c13f34
|
203.7 kB | Download |
|
md5:61e38d3f7a644a8159351a3052a611e7
|
58 Bytes | Preview Download |