Wind Turbine SCADA Data For Early Fault Detection
Description
Note: Please check out Version 5 of this dataset since some labels have been corrected.
Note: In version 2 the timestamps of the datasets and event_info-files do not match and there are duplicate timestamps in datasets where the date was converted from a leap year to a non leap year. Please use versions 1 (original dataset) or 4 (without future timestamps) instead.
-----
This dataset is published together with the paper "CARE to Compare: A real-world dataset for anomaly detection in wind turbine data" which explains the dataset in detail and defines the CARE score that can be used to evaluate anomaly detection algorithms on this dataset. When referring to this dataset, please cite the paper mentioned in the related work section.
This new version of the dataset contains one deviation from version 1 regarding the anonymization procedure. Instead of shifting the timestamps of each sub-dataset by a random number of years, the size of the time shift is now determined to be the number of years so that each sub-dataset starts in 2022. This change is made to make the timestamp anonymization more consistent and to avoid future timestamps being present within the data.
The data consists of 95 datasets, containing 89 years of SCADA time series distributed across 36 different wind turbines
from the three wind farms A, B and C. The number of features depends on the wind farm; Wind farm A has 86 features, wind farm B has 257 features and wind farm C has 957 features.
The overall dataset is balanced, as 44 out the 95 datasets contain a labeled anomaly event that leads up to a turbine fault and the other 51 datasets represent normal behavior. Additionally, the quality of training data is ensured by turbine-status-based labels for each data point and further information about some of the given turbine faults are included.
The data for Wind farm A is based on data from the EDP open data platform (https://www.edp.com/en/innovation/open-data/data),
and consists of 5 wind turbines of an onshore wind farm in Portugal.
It contains SCADA data and information derived by a given fault logbook which defines start timestamps for specified faults.
From this data 22 datasets were selected to be included in this data collection.
The other two wind farms are offshore wind farms located in Germany. All three datasets were anonymized due to confidentiality reasons for the wind farms B and C.
Each dataset is provided in form of a csv-file with columns defining the features and rows representing the data points of the time series. Files
More detailed information can be found in the included README-file.
Files
CARE_To_Compare.zip
Files
(5.5 GB)
Name | Size | Download all |
---|---|---|
md5:537a66be609ea5cb471d0d759048e3a2
|
5.5 GB | Preview Download |
Additional details
Additional titles
- Alternative title (English)
- CARE To Compare Data
Related works
- Is described by
- Journal article: 10.3390/data9120138 (DOI)