Published September 19, 2025 | Version v1
Dataset Open

The Device Activity Report with Complete Knowledge (DARCK) for NILM

Description

1. Abstract

This dataset contains aggregated and sub-metered power consumption data from a two-person apartment in Germany. Data was collected from March 5 to September 4, 2025, spanning 6 months. It includes an aggregate reading from a main smart meter and individual readings from 40 smart plugs, smart relays, and smart power meters monitoring various appliances.

2. Dataset Overview

  • Apartment: Two-person apartment, approx. 58m², located in Aachen, Germany.
  • Aggregate Meter: eBZ DD3
  • Sub-meters: 31 Shelly Plus Plug S, 6 Shelly Plus 1PM, 3 Shelly Plus PM Mini Gen3
  • Sampling Rate: 1 Hz
  • Measured Quantity: Active Power
  • Unit of Measurement: Watt
  • Duration: 6 months
  • Format: Single CSV file (`DARCK.csv`)
  • Structure: Timestamped rows with columns for the aggregate meter and each sub-metered appliance.
  • Completeness: The main power meter has a completeness of 99.3%. Missing values were linearly interpolated.

3. Download and Usage

The dataset can be downloaded here: https://doi.org/10.5281/zenodo.17159850

As it contains longer off periods with zeros, the CSV file is nicely compressible.


To extract it use: xz -d DARCK.csv.xz.
The compression leads to a 97% smaller file size (From 4GB to 90.9MB).


To use the dataset in python, you can, e.g., load the csv file into a pandas dataframe.

python
import pandas as pd

df = pd.read_csv("DARCK.csv", parse_dates=["time"])

4. Measurement Setup

The main meter was monitored using an infrared reading head magnetically attached to the infrared interface of the meter. An ESP8266 flashed with Tasmota decodes the binary datagrams and forwards the Watt readings to the MQTT broker. Individual appliances were monitored using a combination of Shelly Plugs (for outlets), Shelly 1PM (for wired-in devices like ceiling lights), and Shelly PM Mini (for each of the three phases of the oven). All devices reported to a central InfluxDB database via Home Assistant running in docker on a Dell OptiPlex 3020M.

5. File Format (DARCK.csv)

The dataset is provided as a single comma-separated value (CSV) file.

  • The first row is a header containing the column names.
  • All power values are rounded to the first decimal place.
  • There are no missing values in the final dataset.
  • Each row represents 1 second, from start of measuring in March until the end in September.

Column Descriptions

Column Name

Data Type

Unit

Description

time datetime - Timestamp for the reading in YYYY-MM-DD HH:MM:SS
main float Watt Total aggregate power consumption for the apartment, measured at the main electrical panel.
[appliance_name] float Watt Power consumption of an individual appliance (e.g., lightbathroomfridge, sherlockpc). See Section 8 for a full list.
Aggregate Columns      
aggr_chargers float Watt The sum of sherlockcharger, sherlocklaptop, watsoncharger, watsonlaptop, watsonipadcharger, kitchencharger.
aggr_stoveplates float Watt The sum of stoveplatel1 and stoveplatel2.
aggr_lights float Watt The sum of lightbathroom, lighthallway, lightsherlock, lightkitchen, lightlivingroom, lightwatson, lightstoreroom, fcob, sherlockalarmclocklight, sherlockfloorlamphue, sherlockledstrip, livingfloorlamphue, sherlockglobe, watsonfloorlamp, watsondesklamp and watsonledmap.
Analysis Columns      
inaccuracy float Watt As no electrical device bypasses a power meter, the true inaccuracy can be assessed. It is the absolute error between the sum of individual measurements and the mains reading. A 30W offset is applied to the sum since the measurement devices themselves draw power which is otherwise unaccounted for.

 

6. Data Postprocessing Pipeline

The final dataset was generated from two raw data sources (meter.csv and shellies.csv) using a comprehensive postprocessing pipeline.

6.1. Main Meter (main) Postprocessing

The aggregate power data required several cleaning steps to ensure accuracy.

  1. Outlier Removal: Readings below 10W or above 10,000W were removed (merely 3 occurrences).
  2. Timestamp Burst Correction: The source data contained bursts of delayed readings. A custom algorithm was used to identify these bursts (large time gap followed by rapid readings) and back-fill the timestamps to create an evenly spaced time series.
  3. Alignment & Interpolation: The smart meter pushes a new value via infrared every second. To align those to the whole seconds, it was resampled to a 1-second frequency by taking the mean of all readings within each second (in 99.5% only 1 value). Any resulting gaps (0.7% outage ratio) were filled using linear interpolation.

6.2. Sub-metered Devices (shellies) Postprocessing

The Shelly devices are not prone to the same burst issue as the ESP8266 is. They push a new reading at every change in power drawn. If no power change is observed or the one observed is too small (less than a few Watt), the reading is pushed once a minute, together with a heartbeat. When a device turns on or off, intermediate power values are published, which leads to sub-second values that need to be handled.

  1. Grouping: Data was grouped by the unique device identifier.
  2. Resampling & Filling: The data for each device was resampled to a 1-second frequency using .resample('1s').last().ffill()
    This method was chosen to firstly, capture the last known state of the device within each second, handling rapid on/off events. Secondly, to forward-fill the last state across periods of no new data, modeling that the device's consumption remained constant until a new reading was sent.

6.3. Merging and Finalization

  1. Merge: The cleaned main meter and all sub-metered device dataframes were merged into a single dataframe on the time index.
  2. Final Fill: Any remaining NaN values (e.g., from before a device was installed) were filled with 0.0, assuming zero consumption.

7. Manual Corrections and Known Data Issues

During analysis, two significant unmetered load events were identified and manually corrected to improve the accuracy of the aggregate reading. The error column (inaccuracy) was recalculated after these corrections.

  1. March 10th - Unmetered Bulb: An unmetered 107W bulb was active. It was subtracted from the main reading as if it never happened.
  2. May 31st - Unmetered Air Pump: An unmetered 101W pump for an air mattress was used directly in an outlet with no intermediary plug and hence manually added to the respective plug.

8. Appliance Details and Multipurpose Plugs

The following table lists the column names with an explanation where needed. As Watson moved at the beginning of June, some metering plugs changed their appliance. Those columns were split into new ones that reflect the then used appliance. Hence, there are more columns than physical metering devices. 

Column Name

Location / Appliance

Notes

lightkitchen, lightwatson, lighthallway, lightbathroom, lightsherlock, lightstoreroom, lightlivingroom, fcob, watsonfloorlamp,livingfloorlamphue, watsonledmap, watsondesklamp, sherlockglobe, sherlockledstrip, sherlockfloorlamphue, sherlockalarmclocklight Misc / Lights The seven lights starting with light... are all ceiling lights. fcob is an FCOB (Flip Chip on Board) LED Strip that illuminates the counter in the kitchen.
stoveplatel1, stoveplatel2, stoveovenl3 Kitchen / Stove and Oven Individual phases of the kitchen stove and oven. The stove has 4 plates. The 2 bottom ones are on phase 1, the 2 top ones are on phase 2. The oven is on phase 3.
fridge, washingmachine, microwave, kettle, wheymixer, mixer, blender, kitchensoundsystem Kitchen / Other Appliances  
router, watsonvinylplayer watsoncharger, watsonfan, watsonlaptop, watsondesklamp, watsonipadcharger , watsonmonitor, watsonpiano , htpc Watson's Room, later Living Room  
sherlockpc, sherlocktv, sherlockmonitor, sherlockserver, sherlockguitaramp, sherlockhairdryer, sherlockcharger, sherlocklaptop, sherlockdesk, solderingiron Sherlock's Room Sherlock's desk can be adjusted in height via an electrical motor.
printerscanner, vacuum, drill, airmattress Storeroom  

9. Known Limitations

  • The dataset contains two manually corrected periods for known unmetered loads. Other smaller unmetered loads may still exist, however unlikely as it was given special attention to not plug devices directly into outlets without an intermediary smart plug.
  • The measurement integrated circuits in the individual meters are inexpensive and do not offer industry level precision. Hence, the individual data might be inaccurate. The meters were nevertheless calibrated using a pure resistive load, however, a linear behavior (especially with changing humidity and temperature) cannot be assumed.
  • The resolution is 1 Hz. High frequency analyses are therefore not doable. 
  • The measured quantity is active power only.

10. License

This dataset is made available under the Creative Commons Attribution license.

11. How to Cite

If you use this dataset in your research, please cite the corresponding paper linked to the Zenodo upload.

Files

_README.md

Files (91.0 MB)

Name Size Download all
md5:6f9d54dd1ef0cbf631ba71714564c6a9
16.6 kB Preview Download
md5:b10e208b719188276a9cc5278247c4d1
90.9 MB Download

Additional details

Related works

Is published in
Conference proceeding: 10.1145/3736425.3771959 (DOI)