Published January 15, 2023 | Version 1.0
Dataset Open

Trend Dataset: Evaluation Datasets for Trend Forecasting Studies

  • 1. Gunma University
  • 2. Hotto Link Inc.

Description

A conference paper has published in AAAI-ICWSM 2023 details the method to create the data.

Overview

This dataset is created for the purpose of evaluating trend forecasting methods. The dataset contains 400 entities in 21 domains consisting of 5 categories (geography, organization, person, product, and content). Each entity is annotated with three types of trend attributes: trending status (trending or non-trending), degree of trending (how well it is recognized), and trend period (when the trend started and ended).

There are three main features of this dataset. First, a questionnaire-based recognition rate is used as the gold standard for annotating trend attributes. Second, a collection of entities from a wide range of domains while covering both trending and non-trending, without significant imbalance. Third, trend period annotation on a weekly resolution through interpolation using Internet search volume data.

See our paper "Construction of Evaluation Datasets for Trend Forecasting Studies" *1 (hereafter known as “paper”) for details.

*1 This paper is currently under review in single blind.

Conditions under this dataset construction

Target Trending Phenomenon

attribute value
Target Country Japan
Survey Period from 2015 to 2019
# of entities 400
Target Domains 21 domains of 5 categories (See Target Domain Section)

Target Domain

Category Domain
Location/Geography City/region/landmark
Organization Restaurant/facility, company/brand
Person/Group Politician/political party, researcher, athlete, actor/actress, celebrity/entertainer/comedian, music band/music group
Products Cosmetics, daily necessities, clothing, beverage, foodstuff, others
Art/Content Game, publication, comic/animation, movie, broadcast program, music

Dataset Files

The dataset contains two TSV files and a single Markdown file. The description and metadata about the dataset are provided in a Markdown file. The list of files is shown in the table below.

data label file name file format explanation
Metadata README.md Markdown Overview of this dataset and the schema of each file in English and Japanese
Trend Dataset trend_dataset.tsv TSV Main body of this dataset
Master Entity list master_entity_list.tsv TSV Information of entities included in Trend Dataset

TSV File format

attribute value
header row exists (first row)
index column “ID” column (first column)
encoding UTF-8
delimiter \t (tab)
quoting None *2
escape character \ (back slash)
line terminator \n

*2: csv.QUOTE_NONE in Python

Files

README.md

Files (438.3 kB)

Name Size Download all
md5:a038ce2ddd959faf7401ef11d7ff0869
215.9 kB Download
md5:d6b603111be5703d88c634d2b1dca2e5
25.7 kB Preview Download
md5:8b689f22a1da3696ba8430dcba95bcae
196.7 kB Download