Dataset Open Access

# WordNet–Wikipedia–Wiktionary alignment

Miller, Tristan; Gurevych, Iryna

This distribution contains the three-way alignments between WordNet 3.0, the English edition of Wikipedia, and the English edition of Wiktionary, as described in the LREC 2014 paper by Tristan Miller and Iryna Gurevych (see below).

Format

Here you will find two tab-delimited text files, alignment_3way.tsv and alignment_3way_conjoint.tsv. The first of these contains the full alignment of WordNet, Wikipedia, and Wiktionary, except for the unaligned singleton senses. The second file contains the conjoint alignment of WordNet, Wikipedia, and Wiktionary.

The format of both files is the same: each line consists of a tab-delimited list of “sense” identifiers which refer to the same concept. Identifiers for Wiktionary are prefixed with a # character, and take the form of the unique sense identifier generated by the JWKTL library for a 3 April 2010 dump of the English edition of Wiktionary. Identifiers for Wikipedia are prefixed with a % character, and take the form of the article title (with underscores replacing spaces) as found in a 22 August 2009 snapshot of the English edition of Wikipedia. Identifiers for WordNet are prefixed with a = character and take the form of a synset offset, followed by a hyphen (-), followed by a part of speech label (a, n, r, or v, for adjectives, nouns, adverbs, and verbs, respectively).

Citing this resource

If you use this resource in your own work, please cite the following paper:

Tristan Miller and Iryna Gurevych. WordNet–Wikipedia–Wiktionary: Construction of a three-way alignment. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pages 2094–2100. European Language Resources Association, May 2014. ISBN 978-2-9517408-8-4.

You can use the following BibTeX entry:

@inproceedings{miller2014wordnet,
author       = {Tristan Miller and Iryna Gurevych},
title        = {{WordNet}--{Wikipedia}--{Wiktionary}: Construction of a Three-way Alignment},
booktitle    = {Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)},
year         = 2014,
editor       = {Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis},
pages        = {2094--2100},
month        = may,
publisher    = {European Language Resources Association},
pdf          = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/4_Paper.pdf},
isbn         = {978-2-9517408-8-4},
}
This work has been supported by the Volkswagen Foundation as part of the Lichtenberg Professorship Program under grant № I/82806 and by the German Ministry of Education and Research under grant № 01IS10054G.
Files (652.0 kB)
Name Size
MillerGurevych2014_alignment.tar.xz
652.0 kB
36
6
views