Dataset Open Access

French Word Sense Disambiguation with Princeton WordNet Identifiers

Loïc Vial

This is a dataset for the Word Sense Disambiguation of French using Princeton WordNet identifiers. It contains two training corpora : the SemCor and the WordNet Gloss Corpus, both automatically translated from their original English version, and with sense tags automatically aligned. It contains also a test corpus : the task 12 of SemEval 2013, originally sense annotated with BabelNet identifiers, converted into Princeton WordNet 3.0.

Files (271.6 MB)
Name Size
semcor.fr.xml
md5:87f0a390ccfda959de51063aad08082d
86.0 MB Download
semeval2013task12.fr.xml
md5:17530c037c658dd56eccb00bd8e66b7b
834.6 kB Download
wngt.fr.xml
md5:8a8d772b920667ab375c2460c519c9f0
184.7 MB Download
225
271
views
downloads
All versions This version
Views 225225
Downloads 271271
Data volume 24.4 GB24.4 GB
Unique views 198198
Unique downloads 8787

Share

Cite as