Published March 22, 2021 | Version v1
Dataset Open

Semantically tagged Europarl-cs.v7

  • 1. UEF

Description

Semantically tagged Europarl-cz.v7 

14.9+ M lines

Lexical coverage of the tagging: 83.90%

No semantic ambiguity resolving, all the tags marked

POS tagging for semantic tagging performed with Treetagger: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

Output in base form 

Documentation of the semantic tagger (for Finnish, but same principles hold for Czech, too): 

https://www.aclweb.org/anthology/W19-0306/

https://zenodo.org/record/3676372#.YFNwIa8zY2w

Format: base form POS Semtag

Unknown words marked with tag Z99

Semantic tagging

Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti

Example output

následný    A    N4
postup    N    X4.2
na    R    Z5
základ    N    A2.2 T2+ X4.2
usnesení    N    X6+ X9.2+
parlament# Z99
:     PUNCT
viz    V    X3.4 X2.1 S1.1.1 X2.5+ X2.3+ X3 A7+ Z4 S3.2
zápis    N    M7 Q1.2 S7.3 M1 T2+
předložení    N    A9- A2.2 Q2.2 S1.1.3+ Q4.3 O4.1 K4
dokument    N    Q1.2 X2.2+ Y2
:     PUNCT
viz    V    X3.4 X2.1 S1.1.1 X2.5+ X2.3+ X3 A7+ Z4 S3.2
zápis    N    M7 Q1.2 S7.3 M1 T2+
písemný    A    Q1.2

Files

Files (251.1 MB)

Name Size Download all
md5:5ef2aa183deccefc95e0e69556223779
251.1 MB Download