Dataset Open Access

# EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching

Shiwei Fang; Tamzeed Islam; Sirajum Munir; Shahriar Nirjon

EyeFi Dataset

This dataset is collected as a part of the EyeFi project at Bosch Research and Technology Center, Pittsburgh, PA, USA. The dataset contains WiFi CSI values of human motion trajectories along with ground truth location information captured through a camera. This dataset is used in the following paper "EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching" that is published in the IEEE International Conference on Distributed Computing in Sensor Systems 2020 (DCOSS '20). We also published a dataset paper titled as "Dataset: Person Tracking and Identification using Cameras and Wi-Fi Channel State Information (CSI) from Smartphones" in Data: Acquisition to Analysis 2020 (DATA '20) workshop describing details of data collection. Please check it out for more information on the dataset.

Data Collection Setup

In our experiments, we used Intel 5300 WiFi Network Interface Card (NIC) installed in an Intel NUC and Linux CSI tools [1] to extract the WiFi CSI packets. The (x,y) coordinates of the subjects are collected from Bosch Flexidome IP Panoramic 7000 panoramic camera mounted on the ceiling and Angle of Arrivals (AoAs) are derived from the (x,y) coordinates. Both the WiFi card and camera are located at the same origin coordinates but at different height, the camera is location around 2.85m from the ground and WiFi antennas are around 1.12m above the ground.

The data collection environment consists of two areas, first one is a rectangular space measured 11.8m x 8.74m, and the second space is an irregularly shaped kitchen area with maximum distances of 19.74m and 14.24m between two walls. The kitchen also has numerous obstacles and different materials that pose different RF reflection characteristics including strong reflectors such as metal refrigerators and dishwashers.

To collect the WiFi data, we used a Google Pixel 2 XL smartphone as an access point and connect the Intel 5300 NIC to it for WiFi communication. The transmission rate is about 20-25 packets per second. The same WiFi card and phone are used in both lab and kitchen area.

List of Files
Here is a list of files included in the dataset:

|- 1_person
|- 1_person_1.h5
|- 1_person_2.h5
|- 2_people
|- 2_people_1.h5
|- 2_people_2.h5
|- 2_people_3.h5
|- 3_people
|- 3_people_1.h5
|- 3_people_2.h5
|- 3_people_3.h5
|- 5_people
|- 5_people_1.h5
|- 5_people_2.h5
|- 5_people_3.h5
|- 5_people_4.h5
|- 10_people
|- 10_people_1.h5
|- 10_people_2.h5
|- 10_people_3.h5
|- Kitchen
|- 1_person
|- kitchen_1_person_1.h5
|- kitchen_1_person_2.h5
|- kitchen_1_person_3.h5
|- 3_people
|- kitchen_3_people_1.h5
|- training
|- shuffuled_train.h5
|- shuffuled_valid.h5
|- shuffuled_test.h5
View-Dataset-Example.ipynb



In this dataset, folder 1_person/ , 2_people/ , 3_people/ , 5_people/, and 10_people/ contains data collected from the lab area whereas Kitchen/ folder contains data collected from the kitchen area. To see how the each file is structured, please see below in section Access the data.

The training folder contains the training dataset we used to train the neural network discussed in our paper. They are generated by shuffling all the data from 1_person/ folder collected in the lab area (1_person_1.h5 and 1_person_2.h5).

Why multiple files in one folder?

Each folder contains multiple files. For example, 1_person folder has two files: 1_person_1.h5 and 1_person_2.h5. Files in the same folder always have the same number of human subjects present simultaneously in the scene. However, the person who is holding the phone can be different. Also, the data could be collected through different days and/or the data collection system needs to be rebooted due to stability issue. As result, we provided different files (like 1_person_1.h5, 1_person_2.h5) to distinguish different person who is holding the phone and possible system reboot that introduces different phase offsets (see below) in the system.

Special note:

For 1_person_1.h5, this file is generated by the same person who is holding the phone, and 1_person_2.h5 contains different people holding the phone but only one person is present in the area at a time. Boths files are collected in different days as well.

Access the data
To access the data, hdf5 library is needed to open the dataset. There are free HDF5 viewer available on the official website: https://www.hdfgroup.org/downloads/hdfview/. We also provide an example Python code View-Dataset-Example.ipynb to demonstrate how to access the data.

Each file is structured as (except the files under *"training/"* folder):

|- csi_imag
|- csi_real
|- nPaths_1
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_2
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_3
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- nPaths_4
|- offset_00
|- spotfi_aoa
|- offset_11
|- spotfi_aoa
|- offset_12
|- spotfi_aoa
|- offset_21
|- spotfi_aoa
|- offset_22
|- spotfi_aoa
|- num_obj
|- obj_0
|- cam_aoa
|- coordinates
|- obj_1
|- cam_aoa
|- coordinates
...
|- timestamp


The csi_real and csi_imag are the real and imagenary part of the CSI measurements. The order of antennas and subcarriers are as follows for the 90 csi_real and csi_imag values : [subcarrier1-antenna1, subcarrier1-antenna2, subcarrier1-antenna3, subcarrier2-antenna1, subcarrier2-antenna2, subcarrier2-antenna3,… subcarrier30-antenna1, subcarrier30-antenna2, subcarrier30-antenna3]. nPaths_x group are SpotFi [2] calculated WiFi Angle of Arrival (AoA) with x number of multiple paths specified during calculation. Under the nPath_x group are offset_xx subgroup where xx stands for the offset combination used to correct the phase offset during the SpotFi calculation. We measured the offsets as:

|Antennas | Offset 1 (rad) | Offset 2 (rad) |
|:-------:|:---------------:|:-------------:|
|  1 & 2  |     1.1899      |     -2.0071
|  1 & 3  |     1.3883      |     -1.8129



The measurement is based on the work [3], where the authors state there are two possible offsets between two antennas which we measured by booting the device multiple times. The combination of the offset are used for the offset_xx naming. For example, offset_12 is offset 1 between antenna  1 & 2 and offset 2 between antenna 1 & 3 are used in the SpotFi calculation.

The num_obj field is used to store the number of human subjects present in the scene. The obj_0 is always the subject who is holding the phone. In each file, there are num_obj of obj_x. For each obj_x1, we have the coordinates reported from the camera and cam_aoa, which is estimated AoA from the camera reported coordinates. The (x,y) coordinates and AoA listed here are chronologically ordered (except the files in the training folder) . It reflects the way the person carried the phone moved in the space (for obj_0) and everyone else walked (for other obj_y, where y > 0).

The timestamp is provided here for time reference for each WiFi packets.

To access the data (Python):

import h5py

data = h5py.File('3_people_3.h5','r')

csi_real = data['csi_real'][()]
csi_imag = data['csi_imag'][()]

cam_aoa = data['obj_0/cam_aoa'][()]
cam_loc = data['obj_0/coordinates'][()]


For file inside training/ folder:

Files inside training folder has a different data structure:


|- nPath-1
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-2
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-3
|- aoa
|- csi_imag
|- csi_real
|- spotfi
|- nPath-4
|- aoa
|- csi_imag
|- csi_real
|- spotfi


The group nPath-x is the number of multiple path specified during the SpotFi calculation. aoa is the camera generated angle of arrival (AoA) (can be considered as ground truth), csi_image and csi_real is the imaginary and real component of the CSI value. spotfi is the SpotFi calculated AoA values. The SpotFi values are chosen based on the lowest median and mean error from across 1_person_1.h5 and 1_person_2.h5. All the rows under the same nPath-x group are aligned (i.e., first row of aoa corresponds to the first row of csi_imag, csi_real, and spotfi. There is no timestamp recorded and the sequence of the data is not chronological as they are randomly shuffled from the 1_person_1.h5 and 1_person_2.h5 files.

Citation
If you use the dataset, please cite our paper:

@inproceedings{eyefi2020,
title={EyeFi: Fast Human Identification Through Vision and WiFi-based Trajectory Matching},
author={Fang, Shiwei and Islam, Tamzeed and Munir, Sirajum and Nirjon, Shahriar},
booktitle={2020 IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)},
year={2020},
organization={IEEE}
}


Thanks!

References

1. Halperin, Daniel, et al. "Tool release: Gathering 802.11 n traces with channel state information." ACM SIGCOMM Computer Communication Review 41.1 (2011): 53-53.

2. Kotaru, Manikanta, et al. "Spotfi: Decimeter level localization using wifi." Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 2015.

3. Zhang, Dongheng, et al. "Calibrating Phase Offsets for Commodity WiFi." IEEE Systems Journal (2019).

Files (1.5 GB)
Name Size
EyeFi_Dataset.zip
md5:9856f245dfc157079b08e13ae368bc89
1.5 GB
417
58
views