Authors:
Advanced Integrated Sensing lab (ADVISE) / Department of Electrical Engineering (ESAT) / KU Leuven
Other (Recording, supervision, ...):
The dataset is a derivative of the SINS dataset. The SINS dataset contains a continuous recording of one person living in a vacation home over a period of one week. It was collected using a network of 13 microphone arrays distributed over the entire home. The microphone array consists of 4 linearly arranged microphones. For this dataset 4 microphone arrays in the combined living room and kitchen area are used. Figure 2 shows the floorplan of the recorded environment along with the position of the used sensor nodes.
Approximately 200 hours of data from 4 sensor nodes are taken from the SINS dataset. The partitioning of the data was done randomly. The segments belonging to one particular consecutive activity (e.g. a full session of cooking) were kept together. The data provided for each sensor node contain recordings of the same time period. This means that the performed activities are observed from multiple microphone arrays at the same time instant.
The recordings were split into audio segments of 10s. Each segment represents one activity. These audio segments are provided as individual files along with the ground truth. The daily activities for this dataset (9) are shown in Table 1 along with the available 10s segments in the dataset and the amount of full sessions of a certain activity (e.g. a cooking session).
Activity | # 10s segments | # sessions |
---|---|---|
Absence (nobody present in the room) | 18860 | 42 |
Cooking | 5124 | 13 |
Dishwashing | 1424 | 10 |
Eating | 2308 | 13 |
Other (present but not doing any relevant activity) | 2060 | 118 |
Social activity (visit, phone call) | 4944 | 21 |
Vacuum cleaning | 972 | 9 |
Watching TV | 18648 | 9 |
Working (typing, mouse click, ...) | 18644 | 33 |
Total | 72984 | 268 |
The sensor node configuration used in this setup is a control board together with a linear microphone array. The control board contains an EFM32 ARM cortex M4 microcontroller from Silicon Labs (EFM32WG980) used for sampling the analog audio. The microphone array contains four Sonion N8AC03 MEMS low-power (±17µW) microphones with an inter-microphone distance of 5 cm. The sampling for each audio channel is done sequentially at a rate of 16 kHz with a bit depth of 12. The annotation was performed in two phases. First, during the data collection a smartphone application was used to let the monitored person(s) annotate the activities while being recorded. The person could only select a fixed set of activities. The application was easy to use and did not significantly influence the transition between activities. Secondly, the start and stop timestamps of each activity were refined by using our own annotation software. Postprocessing and sharing the database involves privacy-related aspects. Besides the person(s) living there, multiple people visited the home. Moreover, during a phone call, one can partially hear the person on the other end. A written informed consent was obtained from all participants.
The content of the dataset is structured in the following manner:
dataset root
│ EULA.pdf End user license agreement
│ meta.txt meta data, tsv-format, [audio file (str)][tab][label (str)][tab][session (str)]
│ readme.md Dataset description (markdown)
│ readme.html Dataset description (HTML)
│
└───audio 72984 audio segments, 16-bit 16kHz
│ │ DevNode1_ex1_1.wav name format DevNode{NodeID}_ex{sessionID}_{segmentID}.wav
│ │ DevNode2_ex1_2.wav
│ │ ...
│
└───evaluation_setup cross-validation setup, 4 folds
│ fold1_train.txt training file list, tsv-format, [audio file (str)][tab][label (str)][tab][session (str)]
│ fold1_test.txt test file list, tsv-format, [audio file (str)][tab][label (str)]
│ ...
The multi-channel audio files can be found under directory audio
and are formatted in the following manner:
DevNode{NodeID}_ex{sessionID}_{segmentID}.wav
The file meta.txt
and the content of the folder evaluation_setup
contain filenames along with ground truth labels and an identifier of to which session the segment belongs. These are arranged in the following manner:
[filename (str)][tab][activity label (str)][tab][session (str)]
The directory evaluation_setup
provides cross-validation folds for the development dataset. More information on the usage can be read here
The dataset includes multi-channel audio segments along with the ground truth and cross-validation folds.
Cross-validation folds are provided for the dataset in order to make results reported with this dataset uniform. The setup consists of four folds distributing the available files.
Segments belonging to a particular session of an activity (e.g. a session of cooking collected by multiple sensor nodes) are kept together to minimize leakage between folds. The folds are provided with the dataset in the directory evaluation setup
. For each fold a training, testing and evaluation subset is provided.
evaluation setup\fold[1-4]_train.txt
: training file list (in csv-format)
Format:
[filename (str)][tab][activity label (str)][tab][session (str)]
evaluation setup\fold[1-4]_test.txt
: testing file list (in csv-format)
Format:
[filename (str)][tab]
evaluation setup\fold[1-4]_evaluate.txt
: evaluation file list (in csv-format), same as fold[1-4]_test.txt but with additional reference information. These two files are provided separately to prevent contamination with ground truth when testing the system
Format:
[filename (str)][tab][activity label (str)]
See file EULA.pdf