Semantically tagged Europarl-fi.v7 with POS data (UD1)
Description
Semantically tagged Europarl-fi.v7 with POS data (UD1)
37.5+ M lines
Lexical coverage of the tagging: 91.31%
No semantic ambiguity resolving, all the tags marked
POS tagging for semantic tagging performed with UD1: https://turkunlp.org/finnish_nlp.html#parser
Output in base form and original running text
Documentation of the FiST semantic tagger:
https://www.aclweb.org/anthology/W19-0306/
Semantic tagging
Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti
Format: Text form Base form Semtag POS information
Unknown words marked with tag Z99
Example:
1 Istuntokauden istuntokausi Z99 NOUN _ Case=Gen|Number=Sing 2 nmod:poss _ _
2 uudelleenavaaminen uudelleenavaaminen Z99 NOUN _ Case=Nom|Number=Sing 0 root _ _
1 Julistan #julistaa#Verb#Q2.1 VERB _ Mood=Ind|Number=Sing|Person=1|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _
2 perjantaina #perjantai#Noun#T1.3 NOUN _ Case=Ess|Number=Sing 1 nmod _ _
3 joulukuun #joulukuu#Noun#T1.3 NOUN _ Case=Gen|Number=Sing 5 nmod:poss _ _
4 NUMB
5 päivänä #päivä#Noun#T1.3 NOUN _ Case=Ess|Number=Sing 6 nmod _ _
6 keskeytetyn #keskeyttää#Verb#T2- VERB _ Case=Gen|Degree=Pos|Number=Sing|PartForm=Past|VerbForm=Part|Voice=Pass 8 acl _ _
7 Euroopan #Eurooppa#Proper#Z2 PROPN _ Case=Gen|Number=Sing 8 nmod:poss _ _
8 parlamentin #parlamentti#Noun#G1.1/S5+ NOUN _ Case=Gen|Number=Sing 9 nmod:poss _ _
9 istunnon #istunto#Noun#G1.1 Y2 NOUN _ Case=Gen|Number=Sing 10 dobj _ _
10 avatuksi #avata#Verb#A10+ T2+ A1.1.1 VERB _ Case=Tra|Degree=Pos|Number=Sing|PartForm=Past|VerbForm=Part|Voice=Pass 1 xcomp:ds _ _
11 ja #ja#Conjunction#Z5 CONJ _ _ 1 cc _ _
Files
Files
(202.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b4f9169eab5cb1de0a18e5187eb09d10
|
202.7 MB | Download |