Published April 1, 2021 | Version v1
Dataset Open

Semantically tagged Europarl-fi.v7 with POS data (UD1)

  • 1. UEF

Description

Semantically tagged Europarl-fi.v7 with POS data (UD1)

37.5+ M lines

Lexical coverage of the tagging: 91.31%

No semantic ambiguity resolving, all the tags marked

POS tagging for semantic tagging performed with UD1: https://turkunlp.org/finnish_nlp.html#parser

Output in base form and original running text

Documentation of the FiST semantic tagger: 

https://www.aclweb.org/anthology/W19-0306/

Semantic tagging

Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti

Format: Text form Base form Semtag POS information

Unknown words marked with tag Z99

Example:

1 Istuntokauden istuntokausi Z99 NOUN _ Case=Gen|Number=Sing 2 nmod:poss _ _
2 uudelleenavaaminen uudelleenavaaminen Z99 NOUN _ Case=Nom|Number=Sing 0 root _ _
1 Julistan #julistaa#Verb#Q2.1 VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _
2 perjantaina #perjantai#Noun#T1.3 NOUN _ Case=Ess|Number=Sing 1 nmod _ _
3 joulukuun #joulukuu#Noun#T1.3 NOUN _ Case=Gen|Number=Sing 5 nmod:poss _ _
4 NUMB
5 päivänä #päivä#Noun#T1.3 NOUN _ Case=Ess|Number=Sing 6 nmod _ _
6 keskeytetyn #keskeyttää#Verb#T2- VERB _ Case=Gen|Degree=Pos|Number=Sing|PartForm=Past|VerbForm=Part|Voice=Pass 8 acl _ _
7 Euroopan #Eurooppa#Proper#Z2 PROPN _ Case=Gen|Number=Sing 8 nmod:poss _ _
8 parlamentin #parlamentti#Noun#G1.1/S5+ NOUN _ Case=Gen|Number=Sing 9 nmod:poss _ _
9 istunnon #istunto#Noun#G1.1 Y2 NOUN _ Case=Gen|Number=Sing 10 dobj _ _
10 avatuksi #avata#Verb#A10+ T2+ A1.1.1 VERB _ Case=Tra|Degree=Pos|Number=Sing|PartForm=Past|VerbForm=Part|Voice=Pass 1 xcomp:ds _ _
11 ja #ja#Conjunction#Z5 CONJ _ _ 1 cc _ _

Files

Files (202.7 MB)

Name Size Download all
md5:b4f9169eab5cb1de0a18e5187eb09d10
202.7 MB Download