Published October 6, 2020 | Version v1
Dataset Open

Tense-Annotation

  • 1. Idiap Research Institute

Description

This dataset contains parallel English and French texts from the Europarl corpus (Koehn, 2005).

The files provide alignments of EN and FR verbs along with information on their position, tense and voice and can therefore be used in translational studies for these languages and/or the training of translation systems that can make use of the labels in this resource.

Although the resource was created semi-automatically, the verb alignment and inferred tenses are of high precision, especially in the second file contained in the package:

Tense-Annotation-full.txt : complete alignment.

Tense-Annotation-gold.txt : alignments only for cases where there is an EN /and/ an FR tense that was inferred from the verbs.

 

The format in the two files is the following:

EN sentence
FR sentence
Position_in_EN    \tab    EN_verb1    \tab    EN_tense    \tab    EN_voice    \tab    FR_verb    \tab    FR_tense    \tab    FR_voice
Position_in_EN    \tab    EN_verb2    \tab    EN_tense    \tab    EN_voice    \tab    FR_verb    \tab    FR_tense    \tab    FR_voice

EN sentence
FR sentence
Position_in_EN    \tab    EN_verb1    \tab    EN_tense    \tab    EN_voice    \tab    FR_verb    \tab    FR_tense    \tab    FR_voice

...

 

The following is an explanation on the labels used:

EN_tense:
past_perf_cont = Past Perfect Continuous
past_perf = Past Perfect
past_cont = Simple Past Continuous
sim_past = Simple Past
pres_perf = Present Perfect
pres_perf_cont = Present Perfect Continuous
pres_perf = Present Perfect
pres_cont = Present Continuous
pres = Present
fut_perf_cont = Future Perfect Continuous
fut_perf = Future Perfect
fut_cont = Future Continuous
fut = Future
cond_perf_cont = conditional verb group with in continuous past tense
cond_perf = conditional verb group in past tense
cond_cont = conditional verb group in continuous present tense
cond = conditional verb group in present tense
infinitif = base verb form
no_tag = tense not found

EN_voice:
active, passive, unknown

FR_tense:
pres = présent
passe_comp = passé composé
imparfait = imparfait
plus_que_parf = plus-que-parfait
passe_sim = passé simple
passe_rec = passé récent
passe_ant = passé antérieur
imperatif = impératif
subjonctif = subjonctif
conditionnel = conditionnel
futur_proche = futur proche
futur = futur
futur_ant = futur antérieur
no_tag = tense not found

FR_voice:
active, passive, unknown

@ = unaligned words

 

Files

CorpusAnnotatedTenseVoice-Partial.txt

Files (305.0 MB)

Name Size Download all
md5:d586b79325cb27612200ba79461a613e
99.5 MB Preview Download
md5:b316b731966312775e5bee738a8ae311
205.5 MB Preview Download
md5:9f97bde9199086cb3076b004990ec61b
2.4 kB Preview Download