Dataset Open Access

Tourism Forecast with Weather, Event, and Cross-Industry Data

Pfäffli, Daniel; Lionetti, Simone; Pouly, Marc; Wegelin, Philipp; vor der Brück, Tim

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.4133644", 
  "language": "eng", 
  "title": "Tourism Forecast with Weather, Event, and Cross-Industry Data", 
  "issued": {
    "date-parts": [
  "abstract": "<p><strong>Introduction</strong></p>\n\n<p>Substantial short-term demand fluctuations are common in the tourism industry. Therefore, tourism companies such as accommodation, transportation, catering, and leisure facilities have a vital interest in precise forecasts of the number of customers.</p>\n\n<p>We present a novel forecasting dataset for tourism, consisting of four Swiss companies, one accommodation, two transportation, and one indoor leisure businesses, all located in the same touristic region. It covers a total of ten years starting in 2007 and ending in 2016. The dataset allows using cross-series information and includes explanatory variables, such as calendar effects, event data, and weather forecast information.</p>\n\n<p>Machine learning (ML) practitioners, statisticians, and experts of tourism sectors are invited to investigate our dataset for new insights on short-term forecasting for industries in tourism.</p>\n\n<p>The&nbsp;Algorithmic Business Research Lab (ABIZ) of the Lucerne University of Applied Sciences and Arts, Switzerland, researches ML algorithms for businesses to support industry partners in developing business models and services based on complex algorithms as well as in the induced digital transformation. In a joint effort with Institute of Tourism (ITW) of the School of Business of the&nbsp;Lucerne University of Applied Sciences and Arts, Switzerland, we provide the present tourism dataset as part of a publication.</p>\n\n<p>Contacts and further&nbsp;information can be found at&nbsp;<a href=\"\"></a></p>\n\n<p><strong>The dataset</strong></p>\n\n<p>The dataset comprises 3653 days of customer numbers of four Swiss companies in the tourism sector. There are 556 feature columns, four target columns, and two mask columns, 562&nbsp;columns in total.</p>\n\n<p><strong>Target variables</strong></p>\n\n<p>The customer volume data has daily resolution and features at worst minor interruptions of a few days over a common period of ten years, starting in 2007 and ending in 2016. The missing values for one transportation and the indoor leisure company are masked, and the masks are available as indicator variables.</p>\n\n<p><strong>Feature Variables</strong></p>\n\n<p>The dataset contains both numerical and categorical,</p>\n\n<ul>\n\t<li>calendar effects, such as day of the week, weekend, and month features,</li>\n\t<li>event data, e.g., public and school holidays, free-time regional events, promotions or revisions for the facilities under exam,</li>\n\t<li>weather forecast features provided by the Federal Office of Meteorology and Climatology (MeteoSwiss), which encodes information about conditions in the locations of the four businesses and neighboring regions.</li>\n</ul>\n\n<p>The weather forecast data consist of information about temperature, sunshine, precipitation, and wind, forecasted up to 3 days in advance. Note that the weather forecast model is updated regularly, and therefore many features do not cover the entire period. We want to point out that there are categorical weather summary annotations created by meteorologists, which are only provided for the last year.</p>\n\n<p><strong>Dataset Download</strong></p>\n\n<p>In the published version of the dataset, feature names are replaced with pseudonyms, but descriptions are given to identify feature groups with similar meaning. The content available for download consists of&nbsp;</p>\n\n<ul>\n\t<li>data.csv,<br>\n\tCSV format, 11.3 MB, 3654 rows, and 562 columns with the time series data;<br>\n\t&nbsp;</li>\n\t<li>data-description.csv,<br>\n\tCSV format, 36 KB, 563 rows and 12&nbsp;columns with feature name and short description, minimal statistics for cross-checking, and indicator variables that specify which single-company datasets a feature belongs.</li>\n</ul>\n\n<p>&nbsp;</p>", 
  "author": [
      "family": "Pf\u00e4ffli, Daniel"
      "family": "Lionetti, Simone"
      "family": "Pouly, Marc"
      "family": "Wegelin, Philipp"
      "family": "vor der Br\u00fcck, Tim"
  "type": "dataset", 
  "id": "4133644"
All versions This version
Views 492492
Downloads 322322
Data volume 843.1 MB843.1 MB
Unique views 463463
Unique downloads 211211


Cite as