Published February 22, 2016 | Version v1
Dataset Open

Palmetto position storing Lucene index of Dutch Wikipedia

  • 1. Netherlands eScience Center
  • 2. University of Amsterdam

Description

Dutch language resource for calculating topic coherence with Palmetto [1, 2]. The dataset is a position storing Lucene index of the Dutch Wikipedia [3]. It was created in the context of the Netherlands eScience Center Dilipad project [4]. The pdf file contains the results of a case study that shows best topic coherence measure for topics consisting of Dutch nouns is NPMI.

More details can be found in the README.

[1] M. Roeder, A. Both, and A. Hinneburg. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 399–408, 2015.

[2] http://aksw.org/Projects/Palmetto.html

[3] https://dumps.wikimedia.org/nlwiki/20151102/

[4] https://www.esciencecenter.nl/project/dilipad

Files

case_study.pdf

Files (658.3 MB)

Name Size Download all
md5:11b81cdd6ed9520fbc46ada4bf0012b5
146.6 kB Preview Download
md5:c7762b00271203e5fde48816cf1f9f03
658.2 MB Download
md5:17f782f72275d98e71f4eb901ae26146
1.8 kB Preview Download