Semantically tagged Europarl-sv.v7 with POS information, UD2
Description
Semantically tagged Europarl-sv.v7 with POS data (UD2)
49.7+ M lines
Lexical coverage of the tagging: 87.79%
No semantic ambiguity resolving, all the tags marked
POS tagging for semantic tagging performed with UD2: https://turkunlp.org/finnish_nlp.html#parser
Output in base form and original running text, original sentences separated
Documentation of the FiST semantic tagger:
https://www.aclweb.org/anthology/W19-0306/
Semantic tagging
Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti
Format: Text form Base form Semtag POS information
Unknown words marked with tag Z99
Example:
# sent_id = 2
# text = Jag förklarar Europaparlamentets session återupptagen efter avbrottet den 17 december.
1 Jag #jag#nn#S1.2.3+ Q4.1 PRON PERS-P1SG-NOM Case=Nom|Definite=Def|Gender=Com|Number=Sing|PronType=Prs 2 nsubj _ _
2 förklarar #förklara#vb#Q2.2 K5.1% VERB PRES-ACT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _
3 Europaparlamentets Europaparlamentets Z99 PROPN SG-GEN Case=Gen 4 nmod:poss _ _
4 session #session#nn#T1.3 NOUN SG-IND-NOM Case=Nom|Definite=Ind|Gender=Com|Number=Sing 2 obj _ _
5 återupptagen återupptå Z99 VERB AD-SG-IND Mood=Ind|VerbForm=Inf|Voice=Pass 2 xcomp _ _
6 efter #efter#pp#M6 N4 T4- X7 A6.1+ ADP _ _ 7 case _ _
7 avbrottet #avbrott#nn#T2- T1.2 NOUN SG-DEF-NOM Case=Nom|Definite=Def|Gender=Neut|Number=Sing 5 obl _ _
8 den #den#al#Z5 PRON PERS-P3SG Definite=Def|Number=Plur|PronType=Prs 7 nmod _ _
9 NUMB
10 december decemb Z99 NOUN PL-IND-NOM Case=Nom|Definite=Ind|Gender=Com|Number=Plur 8 nmod _ SpaceAfter=No
. PUNCT
Files
Files
(310.3 MB)
Name | Size | Download all |
---|---|---|
md5:cca7beb77cea42a691742fdf8665aa36
|
310.3 MB | Download |