The Device Activity Report with Complete Knowledge (DARCK) for NILM
Creators
Description
1. Abstract
This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.
2. Dataset Overview
- Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
- Aggregate Meter: eBZ DD3
- Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
- Sampling Rate: 1 Hz
- Measured Quantity: Active Power
- Unit of Measurement: Watt
- Duration: 6 months
- Format: Single CSV file (`DARCK.csv`)
- Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
- Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.
3. Download and Usage
The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850
As it contains longer off periods with zeros, the CSV file is nicely compressible.
To extract it use: xz -d DARCK.csv.xz.
The compression leads to a 97% smaller file size (From 4GB to 90.9MB).
To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.
pythonimport pandas as pd
df = pd.read_csv("DARCK.csv", parse_dates=["time"])
4. Measurement Setup
The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.
5. File Format (DARCK.csv)
The dataset is provided as a single comma-separated value (CSV) file.
- The first row is a header containing the column names.
- All power values are rounded to the first decimal place.
- There are no missing values in the final dataset.
- Each row represents 1 second, from start of measuring in March until the end in September.
Column Descriptions
Column Name |
Data Type |
Unit |
Description |
time |
datetime | - | Timestamp for the reading in YYYY-MM-DD HH:MM:SS |
main |
float | Watt | Total aggregate power consumption for the apartment, measured at the main electrical panel. |
[appliance_name] |
float | Watt | Power consumption of an individual appliance (e.g., lightbathroom, fridge, sherlockpc). See Section 8 for a full list. |
| Aggregate Columns | |||
aggr_chargers |
float | Watt | The sum of sherlockcharger, sherlocklaptop, watsoncharger, watsonlaptop, watsonipadcharger, kitchencharger. |
aggr_stoveplates |
float | Watt | The sum of stoveplatel1 and stoveplatel2. |
aggr_lights |
float | Watt | The sum of lightbathroom, lighthallway, lightsherlock, lightkitchen, lightlivingroom, lightwatson, lightstoreroom, fcob, sherlockalarmclocklight, sherlockfloorlamphue, sherlockledstrip, livingfloorlamphue, sherlockglobe, watsonfloorlamp, watsondesklamp and watsonledmap. |
| Analysis Columns | |||
inaccuracy |
float | Watt | As no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for. |
6. Data Postprocessing Pipeline
The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.
6.1. Main Meter (main) Postprocessing
The aggregate power data required several cleaning steps to ensure accuracy.
- Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
- Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
- Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.
6.2. Sub-metered Devices (shellies) Postprocessing
The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.
- Grouping: Data was grouped by the unique device identifier.
- Resampling & Filling: The data for each device was resampled to a 1-second frequency using
.resample('1s').last().ffill().
This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.
6.3. Merging and Finalization
- Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the
timeindex. - Final Fill: Any remaining
NaNvalues (e.g., from before a device was installed) were filled with0.0, assuming zero consumption.
7. Manual Corrections and Known Data Issues
During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.
- March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
- May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.
8. Appliance Details and Multipurpose Plugs
The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance. Those columns were split into new ones that reflect the then used appliance. Hence, there are more columns than physical metering devices.
Column Name |
Location / Appliance |
Notes |
lightkitchen, lightwatson, lighthallway, lightbathroom, lightsherlock, lightstoreroom, lightlivingroom, fcob, watsonfloorlamp,livingfloorlamphue, watsonledmap, watsondesklamp, sherlockglobe, sherlockledstrip, sherlockfloorlamphue, sherlockalarmclocklight |
Misc / Lights | The seven lights starting with light... are all ceiling lights. fcob is an FCOB (Flip Chip on Board) LED Strip that illuminates the counter in the kitchen. |
stoveplatel1, stoveplatel2, stoveovenl3 |
Kitchen / Stove and Oven | Individual phases of the kitchen stove and oven. The stove has 4 plates. The 2 bottom ones are on phase 1, the 2 top ones are on phase 2. The oven is on phase 3. |
fridge, washingmachine, microwave, kettle, wheymixer, mixer, blender, kitchensoundsystem |
Kitchen / Other Appliances | |
router, watsonvinylplayer watsoncharger, watsonfan, watsonlaptop, watsondesklamp, watsonipadcharger , watsonmonitor, watsonpiano , htpc |
Watson's Room, later Living Room | |
sherlockpc, sherlocktv, sherlockmonitor, sherlockserver, sherlockguitaramp, sherlockhairdryer, sherlockcharger, sherlocklaptop, sherlockdesk, solderingiron |
Sherlock's Room | Sherlock's desk can be adjusted in height via an electrical motor. |
printerscanner, vacuum, drill, airmattress |
Storeroom |
9. Known Limitations
- The dataset contains two manually corrected periods for known unmetered loads. Other smaller unmetered loads may still exist, however unlikely as it was given special attention to not plug devices directly into outlets without an intermediary smart plug.
- The measurement integrated circuits in the individual meters are inexpensive and do not offer industry level precision. Hence, the individual data might be inaccurate. The meters were nevertheless calibrated using a pure resistive load, however, a linear behavior (especially with changing humidity and temperature) cannot be assumed.
- The resolution is 1 Hz. High frequency analyses are therefore not doable.
- The measured quantity is active power only.
10. License
This dataset is made available under the Creative Commons Attribution license.
11. How to Cite
If you use this dataset in your research, please cite the corresponding paper linked to the Zenodo upload.
Files
_README.md
Files
(91.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6f9d54dd1ef0cbf631ba71714564c6a9
|
16.6 kB | Preview Download |
|
md5:b10e208b719188276a9cc5278247c4d1
|
90.9 MB | Download |
Additional details
Related works
- Is published in
- Conference proceeding: 10.1145/3736425.3771959 (DOI)