Published April 6, 2021 | Version v1
Dataset Open

Semantically tagged Europarl-sv.v7 with POS information, UD2

  • 1. UEF

Description

Semantically tagged Europarl-sv.v7 with POS data (UD2)

49.7+ M lines

Lexical coverage of the tagging: 87.79%

No semantic ambiguity resolving, all the tags marked

POS tagging for semantic tagging performed with UD2: https://turkunlp.org/finnish_nlp.html#parser

Output in base form and original running text, original sentences separated

Documentation of the FiST semantic tagger: 

https://www.aclweb.org/anthology/W19-0306/

Semantic tagging

Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti

Format: Text form Base form Semtag POS information

Unknown words marked with tag Z99

Example:

# sent_id = 2
# text = Jag förklarar Europaparlamentets session återupptagen efter avbrottet den 17 december.
1 Jag #jag#nn#S1.2.3+ Q4.1 PRON PERS-P1SG-NOM Case=Nom|Definite=Def|Gender=Com|Number=Sing|PronType=Prs 2 nsubj _ _
2 förklarar #förklara#vb#Q2.2 K5.1% VERB PRES-ACT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _
3 Europaparlamentets Europaparlamentets Z99 PROPN SG-GEN Case=Gen 4 nmod:poss _ _
4 session #session#nn#T1.3 NOUN SG-IND-NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 2 obj _ _
5 återupptagen återupptå Z99 VERB AD-SG-IND Mood=Ind|VerbForm=Inf|Voice=Pass 2 xcomp _ _
6 efter #efter#pp#M6 N4 T4- X7 A6.1+ ADP _ _ 7 case _ _
7 avbrottet #avbrott#nn#T2- T1.2 NOUN SG-DEF-NOM Case=Nom|Definite=Def|Gender=Neut|Number=Sing 5 obl _ _
8 den #den#al#Z5 PRON PERS-P3SG Definite=Def|Number=Plur|PronType=Prs 7 nmod _ _
9 NUMB
10 december decemb Z99 NOUN PL-IND-NOM Case=Nom|Definite=Ind|Gender=Com|Number=Plur 8 nmod _ SpaceAfter=No
. PUNCT
 

Files

Files (310.3 MB)

Name Size Download all
md5:cca7beb77cea42a691742fdf8665aa36
310.3 MB Download