Dataset Open Access

Palmetto position storing Lucene index of Dutch Wikipedia

van der Zwaan, Janneke M.; Marx, Maarten; Kamps, Jaap

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>van der Zwaan,  Janneke M.</dc:creator>
  <dc:creator>Marx, Maarten</dc:creator>
  <dc:creator>Kamps, Jaap</dc:creator>
  <dc:description>Dutch language resource for calculating topic coherence with Palmetto [1, 2]. The dataset is a position storing Lucene index of the Dutch Wikipedia [3]. It was created in the context of the Netherlands eScience Center Dilipad project [4]. The pdf file contains the results of a case study that shows best topic coherence measure for topics consisting of Dutch nouns is NPMI.

More details can be found in the README.

[1] M. Roeder, A. Both, and A. Hinneburg. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 399–408, 2015.



  <dc:subject>topic modeling</dc:subject>
  <dc:subject>topic coherence</dc:subject>
  <dc:title>Palmetto position storing Lucene index of Dutch Wikipedia</dc:title>
All versions This version
Views 4,0344,035
Downloads 186186
Data volume 22.4 GB22.4 GB
Unique views 3,9893,990
Unique downloads 150150


Cite as