Published September 25, 2013 | Version v1
Dataset Open

Disco-Annotation

  • 1. Idiap Research Institute
  • 2. University of Geneva
  • 3. Université Catholique de Louvain

Description

Description

Disco-Annotation is a collection of training and test sets with manually annotated discourse relations for 8 discourse connectives in europarl texts.

The 8 connectives with their annotated relations are:

  • although (contrast|concession)
  • as (prep|causal|temporal|comparison|concession)
  • however (contrast|concession)
  • meanwhile (contrast|temporal)
  • since (causal|temporal|temporal-causal)
  • though (contrast|concession)
  • while (contrast|concession|temporal|temporal-contrast|temporal-causal)
  • yet (adv|contrast|concession)

For each connective there is a training set and a test set. The relations were annotated by two trained annotators with a translation spotting method. The division into training and test also allows for comparison reasons if you train your own models.

If you need software for the latter, have a look at: https://github.com/idiap/DiscoConn-Classifier

 

Citation
 

Please cite the following papers if you make use of these datasets (and to know more about the annotation method):

@INPROCEEDINGS{Popescu-Belis-LREC-2012,
  author = {Popescu-Belis, Andrei and Meyer, Thomas and Liyanapathirana, Jeevanthi and Cartoni, Bruno and Zufferey, Sandrine},
  title = {{D}iscourse-level {A}nnotation over {E}uroparl for {M}achine {T}ranslation:
    {C}onnectives and {P}ronouns},
  booktitle = {Proceedings of the eighth international conference on Language Resources and Evaluation ({LREC})},
  year = {2012},
  address = {Istanbul, Turkey}
}

@Article{Cartoni-DD-2013,
  Author =  {Cartoni, Bruno and Zufferey, Sandrine and Meyer, Thomas},
  Title =   {{Annotating the meaning of discourse connectives by looking at their translation: The translation-spotting technique}},
  Journal = {Dialogue \& Discourse},
  Volume = {4},
  Number = {2},
  pages = {65--86},
  year =    {2013}
}

@ARTICLE{Meyer-TSLP-submitted,
  author = {Meyer, Thomas and Hajlaoui, Najeh and Popescu-Belis, Andrei},
  title = {{Disambiguating Discourse Connectives for Statistical Machine Translation in Several Languages}},
  journal = {IEEE/ACM Transactions of Audio, Speech, and Language Processing},
  year = {submitted},
  volume = {},
  pages = {},
  number = {}
}

Files

MD5SUM.TXT

Files (203.8 kB)

Name Size Download all
md5:c0f89ff312dc0a9c9bab688211c13f34
203.7 kB Download
md5:61e38d3f7a644a8159351a3052a611e7
58 Bytes Preview Download