Dataset Open Access

Palmetto position storing Lucene index of Dutch Wikipedia

van der Zwaan, Janneke M.; Marx, Maarten; Kamps, Jaap


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>van der Zwaan,  Janneke M.</dc:creator>
  <dc:creator>Marx, Maarten</dc:creator>
  <dc:creator>Kamps, Jaap</dc:creator>
  <dc:date>2016-02-22</dc:date>
  <dc:description>Dutch language resource for calculating topic coherence with Palmetto [1, 2]. The dataset is a position storing Lucene index of the Dutch Wikipedia [3]. It was created in the context of the Netherlands eScience Center Dilipad project [4]. The pdf file contains the results of a case study that shows best topic coherence measure for topics consisting of Dutch nouns is NPMI.

More details can be found in the README.

[1] M. Roeder, A. Both, and A. Hinneburg. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 399–408, 2015.

[2] http://aksw.org/Projects/Palmetto.html

[3] https://dumps.wikimedia.org/nlwiki/20151102/

[4] https://www.esciencecenter.nl/project/dilipad</dc:description>
  <dc:identifier>https://zenodo.org/record/46377</dc:identifier>
  <dc:identifier>10.5281/zenodo.46377</dc:identifier>
  <dc:identifier>oai:zenodo.org:46377</dc:identifier>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/legalcode</dc:rights>
  <dc:subject>topic modeling</dc:subject>
  <dc:subject>topic coherence</dc:subject>
  <dc:subject>Palmetto</dc:subject>
  <dc:subject>Dutch</dc:subject>
  <dc:subject>Wikipedia</dc:subject>
  <dc:title>Palmetto position storing Lucene index of Dutch Wikipedia</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
3,655
57
views
downloads
All versions This version
Views 3,6553,656
Downloads 5757
Data volume 7.2 GB7.2 GB
Unique views 3,6413,642
Unique downloads 4343

Share

Cite as