Published March 19, 2021 | Version v1
Dataset Open

Semantically tagged Europarl-sv.v7

  • 1. UEF

Description

Semantically tagged Europarl-sv.v7 

45.6+ M lines

Lexical coverage of the tagging: 83.90%

No semantic ambiguity resolving, all the tags marked

POS tagging for semantic tagging performed with Treetagger: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

Output in base form 

Documentation of the semantic tagger (for Finnish, but same principles hold for Swedish, too): 

https://www.aclweb.org/anthology/W19-0306/

https://zenodo.org/record/3676372#.YFNwIa8zY2w

Semantic tagging

Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti

Format: base form POS Semtag

Unknown words marked with tag Z99

Example output:

Återupptagande# Z99
av    pp    Z5
sessionen# Z99
jag    nn    S1.2.3+ Q4.1
förklara    vb    Q2.2 K5.1%
Europaparlamentets# Z99
session    nn    T1.3
återuppta# Z99
efter    av    X9.1-
avbrottet# Z99
en    nl    N1 Z8
17 NUMB
december    nn    T1.3
. PUNCT
jag    nn    S1.2.3+ Q4.1

 

Files

Files (843.9 MB)

Name Size Download all
md5:867643ea6b035cd4f2a50b444a2dd37c
843.9 MB Download