Trend Dataset: Evaluation Datasets for Trend Forecasting Studies
Description
A conference paper has published in AAAI-ICWSM 2023 details the method to create the data.
Overview
This dataset is created for the purpose of evaluating trend forecasting methods. The dataset contains 400 entities in 21 domains consisting of 5 categories (geography, organization, person, product, and content). Each entity is annotated with three types of trend attributes: trending status (trending or non-trending), degree of trending (how well it is recognized), and trend period (when the trend started and ended).
There are three main features of this dataset. First, a questionnaire-based recognition rate is used as the gold standard for annotating trend attributes. Second, a collection of entities from a wide range of domains while covering both trending and non-trending, without significant imbalance. Third, trend period annotation on a weekly resolution through interpolation using Internet search volume data.
See our paper "Construction of Evaluation Datasets for Trend Forecasting Studies" *1 (hereafter known as “paper”) for details.
*1 This paper is currently under review in single blind.
Conditions under this dataset construction
Target Trending Phenomenon
attribute | value |
---|---|
Target Country | Japan |
Survey Period | from 2015 to 2019 |
# of entities | 400 |
Target Domains | 21 domains of 5 categories (See Target Domain Section) |
Target Domain
Category | Domain |
---|---|
Location/Geography | City/region/landmark |
Organization | Restaurant/facility, company/brand |
Person/Group | Politician/political party, researcher, athlete, actor/actress, celebrity/entertainer/comedian, music band/music group |
Products | Cosmetics, daily necessities, clothing, beverage, foodstuff, others |
Art/Content | Game, publication, comic/animation, movie, broadcast program, music |
Dataset Files
The dataset contains two TSV files and a single Markdown file. The description and metadata about the dataset are provided in a Markdown file. The list of files is shown in the table below.
data label | file name | file format | explanation |
---|---|---|---|
Metadata | README.md | Markdown | Overview of this dataset and the schema of each file in English and Japanese |
Trend Dataset | trend_dataset.tsv | TSV | Main body of this dataset |
Master Entity list | master_entity_list.tsv | TSV | Information of entities included in Trend Dataset |
TSV File format
attribute | value |
---|---|
header row | exists (first row) |
index column | “ID” column (first column) |
encoding | UTF-8 |
delimiter | \t (tab) |
quoting | None *2 |
escape character | \ (back slash) |
line terminator | \n |
*2: csv.QUOTE_NONE in Python
Files
README.md
Files
(438.3 kB)
Name | Size | Download all |
---|---|---|
md5:a038ce2ddd959faf7401ef11d7ff0869
|
215.9 kB | Download |
md5:d6b603111be5703d88c634d2b1dca2e5
|
25.7 kB | Preview Download |
md5:8b689f22a1da3696ba8430dcba95bcae
|
196.7 kB | Download |