Published March 18, 2021 | Version v1
Dataset Open

Semantically tagged Europarl-fi.v7

  • 1. UEF

Description

Semantically tagged Europarl-fi.v7 

37+ M lines

Lexical coverage of the tagging: 92.88%

No semantic ambiguity resolving, all the tags marked

POS tagging for semantic tagging performed with Treetagger: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

Output in base form 

Documentation of the FiST semantic tagger: 

https://www.aclweb.org/anthology/W19-0306/

https://zenodo.org/record/3676372#.YFNwIa8zY2w

Semantic tagging

Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti

Format: base form POS Semtag

Unknown words marked with tag Z99

Example:

istunto    Noun    G1.1 Y2
uudelleen    Adverb    N6+
julistaa    Verb    Q2.1
perjantai    Noun    T1.3
joulu    Noun    S9/T1.3
17 NUMB
. PUNCT
päivä    Noun    T1.3
keskeyttää    Verb    T2-
Eurooppa    Proper    Z2

Files

Files (807.6 MB)

Name Size Download all
md5:c857c608d36546bc38dbe20228f06a6c
807.6 MB Download