Published October 29, 2020 | Version v1
Dataset Open

Tourism Forecast with Weather, Event, and Cross-Industry Data

Description

Introduction

Substantial short-term demand fluctuations are common in the tourism industry. Therefore, tourism companies such as accommodation, transportation, catering, and leisure facilities have a vital interest in precise forecasts of the number of customers.

We present a novel forecasting dataset for tourism, consisting of four Swiss companies, one accommodation, two transportation, and one indoor leisure businesses, all located in the same touristic region. It covers a total of ten years starting in 2007 and ending in 2016. The dataset allows using cross-series information and includes explanatory variables, such as calendar effects, event data, and weather forecast information.

Machine learning (ML) practitioners, statisticians, and experts of tourism sectors are invited to investigate our dataset for new insights on short-term forecasting for industries in tourism.

The Algorithmic Business Research Lab (ABIZ) of the Lucerne University of Applied Sciences and Arts, Switzerland, researches ML algorithms for businesses to support industry partners in developing business models and services based on complex algorithms as well as in the induced digital transformation. In a joint effort with Institute of Tourism (ITW) of the School of Business of the Lucerne University of Applied Sciences and Arts, Switzerland, we provide the present tourism dataset as part of a publication.

Contacts and further information can be found at http://www.abiz.ch/.

The dataset

The dataset comprises 3653 days of customer numbers of four Swiss companies in the tourism sector. There are 556 feature columns, four target columns, and two mask columns, 562 columns in total.

Target variables

The customer volume data has daily resolution and features at worst minor interruptions of a few days over a common period of ten years, starting in 2007 and ending in 2016. The missing values for one transportation and the indoor leisure company are masked, and the masks are available as indicator variables.

Feature Variables

The dataset contains both numerical and categorical,

  • calendar effects, such as day of the week, weekend, and month features,
  • event data, e.g., public and school holidays, free-time regional events, promotions or revisions for the facilities under exam,
  • weather forecast features provided by the Federal Office of Meteorology and Climatology (MeteoSwiss), which encodes information about conditions in the locations of the four businesses and neighboring regions.

The weather forecast data consist of information about temperature, sunshine, precipitation, and wind, forecasted up to 3 days in advance. Note that the weather forecast model is updated regularly, and therefore many features do not cover the entire period. We want to point out that there are categorical weather summary annotations created by meteorologists, which are only provided for the last year.

Dataset Download

In the published version of the dataset, feature names are replaced with pseudonyms, but descriptions are given to identify feature groups with similar meaning. The content available for download consists of 

  • data.csv,
    CSV format, 11.3 MB, 3654 rows, and 562 columns with the time series data;
     
  • data-description.csv,
    CSV format, 36 KB, 563 rows and 12 columns with feature name and short description, minimal statistics for cross-checking, and indicator variables that specify which single-company datasets a feature belongs.

 

Files

data-description.csv

Files (11.9 MB)

Name Size Download all
md5:934be8ad4f541d36a1d2340498f451f5
53.6 kB Preview Download
md5:d03bca6fe54d243a899c18bcbeaec59c
11.9 MB Preview Download