Long document similarity dataset, Wikipedia excerptions for wine collections
Creators
Description
Wine-related articles extracted from Wikipedia.
For all articles, the figures and tables have been filtered out, as well as the categories and "see also" sections.
The article structure, and particularly the sub-titles and paragraphs are kept in these datasets
Wines
Wikipedia wines dataset consists of 1635 articles from the wine domain. The extracted dataset consists of a non-trivial mixture of articles, including different wine categories, brands, wineries, grape types, and more. The ground-truth recommendations were crafted by a human sommelier, which annotated 92 source articles with ~10 ground-truth recommendations for each sample. Examples for ground-truth expert-based recommendations are
- Dom Pérignon - Moët & Chandon
- Pinot Meunier - Chardonnay
Files
wines.txt
Files
(11.4 MB)
Name | Size | Download all |
---|---|---|
md5:617fa73f2a331ab160d7653c81daf17e
|
11.4 MB | Preview Download |