Dataset Open Access

Palmetto position storing Lucene index of Dutch Wikipedia

van der Zwaan, Janneke M.; Marx, Maarten; Kamps, Jaap

Dutch language resource for calculating topic coherence with Palmetto [1, 2]. The dataset is a position storing Lucene index of the Dutch Wikipedia [3]. It was created in the context of the Netherlands eScience Center Dilipad project [4]. The pdf file contains the results of a case study that shows best topic coherence measure for topics consisting of Dutch nouns is NPMI.

More details can be found in the README.

[1] M. Roeder, A. Both, and A. Hinneburg. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 399–408, 2015.

[2] http://aksw.org/Projects/Palmetto.html

[3] https://dumps.wikimedia.org/nlwiki/20151102/

[4] https://www.esciencecenter.nl/project/dilipad

Files (658.3 MB)
Name Size
case_study.pdf md5:11b81cdd6ed9520fbc46ada4bf0012b5 146.6 kB Download
nlwiki-palmetto.tar.gz md5:c7762b00271203e5fde48816cf1f9f03 658.2 MB Download
README.md md5:17f782f72275d98e71f4eb901ae26146 1.8 kB Download

Share

Cite as