Weather prediction dataset
Creators
- 1. Centre for Digitalization and Digitality, University of Applied Sciences Düsseldorf
- 2. Netherlands eScience Center
- 3. Helmholtz AI
- 4. Department of Computer Science, Aberystwyth University
Description
Dataset created for machine learning and deep learning training and teaching purposes.
It can, for instance, be used for classification, regression, and forecasting tasks.
Complex enough to demonstrate realistic issues such as overfitting and unbalanced data, while still remaining intuitively accessible.
Description and units of weather features:
Data includes the following features/variables for several European cities:
Feature (type) | Column name | Description | Physical Unit |
mean temperature | _temp_mean | mean daily temperature | in 1 °C |
max temperature | _temp_max | max daily temperature | in 1 °C |
min temperature | _temp_min | min daily temperature | in 1 °C |
cloud_cover | _cloud_cover | cloud cover | oktas |
global_radiation | _global_radiation | global radiation | in 100 W/m2 |
humidity | _humidity | humidity | in 1 % |
pressure | _pressure | pressure | in 1000 hPa |
precipitation | _precipitation | daily precipitation | in 10 mm |
sunshine | _sunshine | sunshine hours | in 0.1 hours |
wind_speed | _wind_gust | wind gust | in 1 m/s |
wind_gust | _wind_speed | wind speed | in 1 m/s |
File descriptions
weather_prediction_dataset.csv
- Main data file, tabular data, comma-separated CSV. Contains the data for different weather features (daily observations, see below for more details) for 18 European cities or places through the years 2000 to 2010.weather_prediction_picnic_labels.csv
- Optional data to be used as potential labels for classification tasks. Contains booleans to characterize the daily weather conditions as suitable for a picnic (True) or not (False) for all 18 locations in the dataset.weather_prediction_dataset_map.png
- Simple map showing all 18 locations in Europe.metadata.txt
- Further information on the dataset, the data processing, and conversion, as well as the description and units of all weather features.
ORIGINAL DATA TAKEN FROM:
EUROPEAN CLIMATE ASSESSMENT & DATASET (ECA&D), file created on 22-04-2021
THESE DATA CAN BE USED FREELY PROVIDED THAT THE FOLLOWING SOURCE IS ACKNOWLEDGED:
Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface
air temperature and precipitation series for the European Climate Assessment.
Int. J. of Climatol., 22, 1441-1453.
Data and metadata available at http://www.ecad.eu
For more information see metadata.txt file.
The dataset has also been presented at the Teaching Machine Learning Workshop at ECML 2022: https://teaching-ml.github.io/2022/.
The Python code used to create the weather prediction dataset from the ECA&D data can be found on GitHub: https://github.com/florian-huber/weather_prediction_dataset
(this repository also contains Jupyter notebooks with teaching examples)
Versions:
- v5: updated metadata.txt file.
- v4: to be more future proof in times of climate change/crisis --> "BBQ weather" prediction is now "picnic weather" prediction. Data itself remains unchanged.
- v3: added "light" version of the dataset with less features (only 11 locations and fewer variables, reduction from 163 to 89 features) --> This is meant to be used if training times for hands-on session is becoming an issues
- v2: now also contains additional `BBQ_weather` labels, the dataset itself has not changed between versions v1 and v2
Files
metadata.txt
Files
(5.1 MB)
Name | Size | Download all |
---|---|---|
md5:a6c8a1d3022b6731f7b27e6f7f69f53a
|
4.7 kB | Preview Download |
md5:0a121a1e75fb8df7f39011bfb9272f15
|
6.3 kB | Preview Download |
md5:94cf8d8d1f6233ebde011d6062b1af5d
|
2.8 MB | Preview Download |
md5:ae96c3912f24caa097867e9c4da034d8
|
1.5 MB | Preview Download |
md5:40114391d126ec09993b41447d101038
|
337.8 kB | Preview Download |
md5:e9d63bdd7522d91846de24a34b3e7f98
|
394.3 kB | Preview Download |