Trend Dataset: Evaluation Datasets for Trend Forecasting Studies

Matsuno, Shogo; Sakae Mizuki; Takeshi Sakaki

doi:10.5281/zenodo.7014424

Published January 15, 2023 | Version 1.0

Dataset Open

Trend Dataset: Evaluation Datasets for Trend Forecasting Studies

1. Gunma University
2. Hotto Link Inc.

A conference paper has published in AAAI-ICWSM 2023 details the method to create the data.

Overview

This dataset is created for the purpose of evaluating trend forecasting methods. The dataset contains 400 entities in 21 domains consisting of 5 categories (geography, organization, person, product, and content). Each entity is annotated with three types of trend attributes: trending status (trending or non-trending), degree of trending (how well it is recognized), and trend period (when the trend started and ended).

There are three main features of this dataset. First, a questionnaire-based recognition rate is used as the gold standard for annotating trend attributes. Second, a collection of entities from a wide range of domains while covering both trending and non-trending, without significant imbalance. Third, trend period annotation on a weekly resolution through interpolation using Internet search volume data.

See our paper "Construction of Evaluation Datasets for Trend Forecasting Studies" *1 (hereafter known as “paper”) for details.

*1 This paper is currently under review in single blind.

Conditions under this dataset construction

Target Trending Phenomenon

attribute	value
Target Country	Japan
Survey Period	from 2015 to 2019
# of entities	400
Target Domains	21 domains of 5 categories (See Target Domain Section)

Target Domain

Category	Domain
Location/Geography	City/region/landmark
Organization	Restaurant/facility, company/brand
Person/Group	Politician/political party, researcher, athlete, actor/actress, celebrity/entertainer/comedian, music band/music group
Products	Cosmetics, daily necessities, clothing, beverage, foodstuff, others
Art/Content	Game, publication, comic/animation, movie, broadcast program, music

Dataset Files

The dataset contains two TSV files and a single Markdown file. The description and metadata about the dataset are provided in a Markdown file. The list of files is shown in the table below.

data label	file name	file format	explanation
Metadata	README.md	Markdown	Overview of this dataset and the schema of each file in English and Japanese
Trend Dataset	trend_dataset.tsv	TSV	Main body of this dataset
Master Entity list	master_entity_list.tsv	TSV	Information of entities included in Trend Dataset

TSV File format

attribute	value
header row	exists (first row)
index column	“ID” column (first column)
encoding	UTF-8
delimiter	\t (tab)
quoting	None *2
escape character	\ (back slash)
line terminator	\n

*2: csv.QUOTE_NONE in Python

Files

README.md

Files (438.3 kB)

Name	Size	Download all
master_entity_list.tsv md5:a038ce2ddd959faf7401ef11d7ff0869	215.9 kB	Download
README.md md5:d6b603111be5703d88c634d2b1dca2e5	25.7 kB	Preview Download
trend_dataset.tsv md5:8b689f22a1da3696ba8430dcba95bcae	196.7 kB	Download

	All versions	This version
Views	834	829
Downloads	189	189
Data volume	30.4 MB	30.4 MB

Trend Dataset: Evaluation Datasets for Trend Forecasting Studies

Authors/Creators

Description

Files

README.md

Files (438.3 kB)