Published May 26, 2021 | Version v1
Dataset Open

Long document similarity dataset, Wikipedia excerptions for wine collections

Description

Wine-related articles extracted from Wikipedia.

For all articles, the figures and tables have been filtered out, as well as the categories and "see also" sections.

The article structure, and particularly the sub-titles and paragraphs are kept in these datasets

 

Wines

Wikipedia wines dataset consists of 1635 articles from the wine domain. The extracted dataset consists of a non-trivial mixture of articles, including different wine categories, brands, wineries, grape types, and more. The ground-truth recommendations were crafted by a human sommelier, which annotated 92 source articles with ~10 ground-truth recommendations for each sample. Examples for ground-truth expert-based recommendations are 

  • Dom Pérignon - Moët & Chandon
  • Pinot Meunier - Chardonnay

Files

wines.txt

Files (11.4 MB)

Name Size Download all
md5:617fa73f2a331ab160d7653c81daf17e
11.4 MB Preview Download