00000nmm##2200000uu#4500 3250669 doi 10.5281/zenodo.3250669 oai:zenodo.org:3250669 user-webis Gollub, Tim (orcid)0000-0003-1737-6517 Bauhaus-Universität Weimar Busse, Matthias Bauhaus-Universität Weimar Webis-Ambient-15 Hagen, Matthias (orcid)0000-0002-9733-2890 Bauhaus-Universität Weimar info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx subtopic information retrieval subtopic documents ambient <p>This corpus is an extension of the <a href="http://search.fub.it/ambient/">Ambient data set created by Carpineto and Romano</a>. For each subtopic, the websites of the given URLs were downloaded (if accessible). Those documents are named as the original documents, for example, 1/1.4/1.3.html. Each subtopic was then manually enriched to ten documents with websites retrieved by Google (for example, 1/1.1/g00.html - 'g' for Google, 00 for the first Google result). Some subtopics could not be sufficently enriched and were discarded. Moreover, some subtopics were duplicates or not interpretable and were also discarded.</p> <p>The data sets consists of 44 topics (topics.txt) and 481 subtopics (subtopics.txt). Some subtopics are topically very similar and therefore rather difficult to be clustered. These subtopics (11.2, 12.13, 14.2, 19.33, 20.2, 20.5, 21.2, 24.3, 24.4, 27.26, 31.16, 36.7, 44.9) are discarded in the file subtopics-filtered.txt, which lists only the remaining 468 subtopics.</p> eng Tim Gollub, Matthias Busse, Benno Stein, and Matthias Hagen. Keyqueries for Clustering and Labeling. In 12th Asia Information Retrieval Societies Conference (AIRS 2016), pages 42-55, November 2016. Springer. Zenodo 2015-03-13 user-webis info:eu-repo/semantics/other 20200124192255.0 18786945 md5:7f5489c7aa9b9df0ced802a3d59f5637 https://zenodo.org/records/3250669/files/webis-ambient-main-content.tar.gz 20282119 md5:9b65a761c86456f5de99e0eee76e4ff9 https://zenodo.org/records/3250669/files/webis-ambient-plain-text.tar.gz 80294291 md5:69bfbc52d51b0b84c433b9f1f9950200 https://zenodo.org/records/3250669/files/webis-ambient-html.tar.gz open 10.5281/zenodo.3250668 isVersionOf doi