Dataset Open Access

Palmetto position storing Lucene index of Dutch Wikipedia

van der Zwaan, Janneke M.; Marx, Maarten; Kamps, Jaap

Dutch language resource for calculating topic coherence with Palmetto [1, 2]. The dataset is a position storing Lucene index of the Dutch Wikipedia [3]. It was created in the context of the Netherlands eScience Center Dilipad project [4]. The pdf file contains the results of a case study that shows best topic coherence measure for topics consisting of Dutch nouns is NPMI.

More details can be found in the README.

[1] M. Roeder, A. Both, and A. Hinneburg. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 399–408, 2015.




Files (658.3 MB)
Name Size
case_study.pdf md5:11b81cdd6ed9520fbc46ada4bf0012b5 146.6 kB Download
nlwiki-palmetto.tar.gz md5:c7762b00271203e5fde48816cf1f9f03 658.2 MB Download md5:17f782f72275d98e71f4eb901ae26146 1.8 kB Download


Cite as