Published March 19, 2021 | Version v1
Dataset Open

Semantically tagged Europarl-it.v7

  • 1. UEF

Description

Semantically tagged Europarl-it.v7 

54.2+ M lines

Lexical coverage of the tagging: 94.06%

No semantic ambiguity resolving, all the tags marked

POS tagging for semantic tagging performed with Treetagger: https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

Output in base form 

Documentation of the semantic tagger (for Finnish, but same principles hold for Italian, too): 

https://www.aclweb.org/anthology/W19-0306/

https://zenodo.org/record/3676372#.YFNwIa8zY2w

Semantic tagging

Tagging of the data was performed in Puhti computing environment of the CSC – IT CENTER FOR SCIENCE LTD. https://research.csc.fi/-/puhti

Format: base form POS Semtag

Unknown words marked with tag Z99

Example output

ripresa    noun    I1.1+
del    art    Z5
sessione    noun    Q3 T1.3
dichiarare    verb    Q2.2
riprendere    verb    M2
IL    noun    Z5
sessione    noun    Q3 T1.3
del    art    Z5
parlamento    noun    G2.1
europeo    adj    Z2
PON PUNCT
interrompere    verb    T2-
Venerdì    abr    T1.3
17 NUMB

Files

Files (902.3 MB)

Name Size Download all
md5:f6164577e3440a66891e21d9f647212e
902.3 MB Download