Published September 6, 2022 | Version v5
Dataset Open

Weather prediction dataset

  • 1. Centre for Digitalization and Digitality, University of Applied Sciences Düsseldorf
  • 2. Netherlands eScience Center
  • 3. Helmholtz AI
  • 4. Department of Computer Science, Aberystwyth University

Description

Dataset created for machine learning and deep learning training and teaching purposes.
It can, for instance, be used for classification, regression, and forecasting tasks.
Complex enough to demonstrate realistic issues such as overfitting and unbalanced data, while still remaining intuitively accessible.

Description and units of weather features:

Data includes the following features/variables for several European cities:

Feature (type) Column name Description Physical Unit
mean temperature _temp_mean mean daily temperature in 1 °C
max temperature _temp_max max daily temperature in 1 °C
min temperature _temp_min min daily temperature in 1 °C
cloud_cover _cloud_cover cloud cover oktas
global_radiation _global_radiation global radiation in 100 W/m2
humidity _humidity humidity in 1 %
pressure _pressure pressure in 1000 hPa
precipitation _precipitation daily precipitation in 10 mm
sunshine _sunshine sunshine hours in 0.1 hours
wind_speed _wind_gust wind gust in 1 m/s
wind_gust _wind_speed wind speed in 1 m/s

File descriptions

  • weather_prediction_dataset.csv - Main data file, tabular data, comma-separated CSV. Contains the data for different weather features (daily observations, see below for more details) for 18 European cities or places through the years 2000 to 2010.
  • weather_prediction_picnic_labels.csv - Optional data to be used as potential labels for classification tasks. Contains booleans to characterize the daily weather conditions as suitable for a picnic (True) or not (False) for all 18 locations in the dataset.
  • weather_prediction_dataset_map.png- Simple map showing all 18 locations in Europe.
  • metadata.txt - Further information on the dataset, the data processing, and conversion, as well as the description and units of all weather features.

 

ORIGINAL DATA TAKEN FROM:

EUROPEAN CLIMATE ASSESSMENT & DATASET (ECA&D), file created on 22-04-2021
THESE DATA CAN BE USED FREELY PROVIDED THAT THE FOLLOWING SOURCE IS ACKNOWLEDGED:

Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface
air temperature and precipitation series for the European Climate Assessment.
Int. J. of Climatol., 22, 1441-1453.
Data and metadata available at http://www.ecad.eu

For more information see metadata.txt file.
The dataset has also been presented at the Teaching Machine Learning Workshop at ECML 2022: https://teaching-ml.github.io/2022/.

The Python code used to create the weather prediction dataset from the ECA&D data can be found on GitHub: https://github.com/florian-huber/weather_prediction_dataset
(this repository also contains Jupyter notebooks with teaching examples)

Versions:

  • v5: updated metadata.txt file.
  • v4: to be more future proof in times of climate change/crisis --> "BBQ weather" prediction is now "picnic weather" prediction. Data itself remains unchanged.
  • v3: added "light" version of the dataset with less features (only 11 locations and fewer variables, reduction from 163 to 89 features) --> This is meant to be used if training times for hands-on session is becoming an issues
  • v2:  now also contains additional `BBQ_weather` labels, the dataset itself has not changed between versions v1 and v2

 

Files

metadata.txt

Files (5.1 MB)

Name Size Download all
md5:a6c8a1d3022b6731f7b27e6f7f69f53a
4.7 kB Preview Download
md5:0a121a1e75fb8df7f39011bfb9272f15
6.3 kB Preview Download
md5:94cf8d8d1f6233ebde011d6062b1af5d
2.8 MB Preview Download
md5:ae96c3912f24caa097867e9c4da034d8
1.5 MB Preview Download
md5:40114391d126ec09993b41447d101038
337.8 kB Preview Download
md5:e9d63bdd7522d91846de24a34b3e7f98
394.3 kB Preview Download