Dataset Open Access
van der Zwaan, Janneke M.; Marx, Maarten; Kamps, Jaap
Dutch language resource for calculating topic coherence with Palmetto [1, 2]. The dataset is a position storing Lucene index of the Dutch Wikipedia [3]. It was created in the context of the Netherlands eScience Center Dilipad project [4]. The pdf file contains the results of a case study that shows best topic coherence measure for topics consisting of Dutch nouns is NPMI.
More details can be found in the README.
[1] M. Roeder, A. Both, and A. Hinneburg. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 399–408, 2015.
[2] http://aksw.org/Projects/Palmetto.html
[3] https://dumps.wikimedia.org/nlwiki/20151102/
[4] https://www.esciencecenter.nl/project/dilipad
Name | Size | |
---|---|---|
case_study.pdf
md5:11b81cdd6ed9520fbc46ada4bf0012b5 |
146.6 kB | Download |
nlwiki-palmetto.tar.gz
md5:c7762b00271203e5fde48816cf1f9f03 |
658.2 MB | Download |
README.md
md5:17f782f72275d98e71f4eb901ae26146 |
1.8 kB | Download |
All versions | This version | |
---|---|---|
Views | 3,928 | 3,929 |
Downloads | 134 | 134 |
Data volume | 17.1 GB | 17.1 GB |
Unique views | 3,900 | 3,901 |
Unique downloads | 105 | 105 |