BED: Biometric EEG dataset
- 1. University of the West of Scotland
- 2. Durham University
- 3. Universitat de València
Description
The BED dataset
Version 1.0.0
Please cite as: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.
Disclaimer
While every care has been taken to ensure the accuracy of the data included in the BED dataset, the authors and the University of the West of Scotland, Durham University, and Universitat de València do not provide any guaranties and disclaim all responsibility and all liability (including without limitation, liability in negligence) for all expenses, losses, damages (including indirect or consequential damage) and costs which you might incur as a result of the provided data being inaccurate or incomplete in any way and for any reason. 2020, University of the West of Scotland, Scotland, United Kingdom.
Contact
For inquiries regarding the BED dataset, please contact:
- Dr Pablo Arnau-González, arnau.pablo [*AT*] gmail.com
- Dr Stamos Katsigiannis, stamos.katsigiannis [*AT*] durham.ac.uk
- Prof. Miguel Arevalillo-Herráez, miguel.arevalillo [*AT*] uv.es
- Prof. Naeem Ramzan, Naeem.Ramzan [*AT*] uws.ac.uk
Dataset summary
BED (Biometric EEG Dataset) is a dataset specifically designed to test EEG-based biometric approaches that use relatively inexpensive consumer-grade devices, more specifically the Emotiv EPOC+ in this case. This dataset includes EEG responses from 21 subjects to 12 different stimuli, across 3 different chronologically disjointed sessions. We have also considered stimuli aimed to elicit different affective states, so as to facilitate future research on the influence of emotions on EEG-based biometric tasks. In addition, we provide a baseline performance analysis to outline the potential of consumer-grade EEG devices for subject identification and verification. It must be noted that, in this work, EEG data were acquired in a controlled environment in order to reduce the variability in the acquired data stemming from external conditions.
The stimuli include:
- Images selected to elicit specific emotions
- Mathematical computations (2-digit additions)
- Resting-state with eyes closed
- Resting-state with eyes open
- Visual Evoked Potentials at 2, 5, 7, 10 Hz - Standard checker-board pattern with pattern reversal
- Visual Evoked Potentials at 2, 5, 7, 10 Hz - Flashing with a plain colour, set as black
For more details regarding the experimental protocol and the design of the dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, 2021. (Under review)
Dataset structure and contents
The BED dataset contains EEG recordings from 21 subjects, acquired during 3 similar sessions for each subject. The sessions were spaced one week apart from each other.
The BED dataset includes:
- The raw EEG recordings with no pre-processing and the log files of the experimental procedure, in text format
- The EEG recordings with no pre-processing, segmented, structured and annotated according to the presented stimuli, in Matlab format
- The features extracted from each EEG segment, as described in the associated publication
The dataset is organised in 3 folders:
- RAW
- RAW_PARSED
- Features
RAW/ Contains the RAW files
RAW/sN/ Contains the RAW files associated with subject N
Each folder sN is composed by the following files:
- sN_s1.csv, sN_s2.csv, sN_s3.csv -- Files containing the EEG recordings for subject N and session 1, 2, and 3, respectively. These files contain 39 columns:
COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 ...UNUSED DATA... UNIX_TIMESTAMP
- subject_N_session_1_time_X.log, subject_N_session_2_time_X.log, subject_N_session_3_time_X.log -- Log files containing the sequence of events for the subject N and the session 1,2, and 3 respectively.
RAW_PARSED/
Contains Matlab files named sN_sM.mat. The files contain the recordings for the subject N in the session M. These files are composed by two variables:
- recording: size (time@256Hz x 17), Columns: COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 UNIX_TIMESTAMP
- events: cell array with size (events x 3) START_UNIX END_UNIX ADDITIONAL_INFO
START_UNIX is the UNIX timestamp in which the event starts
END_UNIX is the UNIX timestamp in which the event ends
ADDITIONAL INFO contains a struct with additional information regarding the specific event, in the case of the images, the expected score, the voted score, in the case of the cognitive task the input, in the case of the VEP the pattern and the frequency, etc..
Features/
Features/Identification
Features/Identification/[ARRC|MFCC|SPEC]/: Each of these folders contain the extracted features ready for classification for each of the stimuli, each file is composed by two variables, "feat" the feature matrix and "Y" the label matrix.
- feat: N x number of features
- Y: N x 2 (the #subject and the #session)
- INFO: Contains details about the event same as the ADDITIONAL INFO
Features/Verification: This folder is composed by 3 different files each of them with one different set of features extracted. Each file is composed by one cstruct array composed by:
- data: the time-series features, as described in the paper
- y: the #subject
- stimuli: the stimuli by name
- session: the #session
- INFO: Contains details about the event
The features provided are in sequential order, so index 1 and index 2, etc. are sequential in time if they belong to the same stimulus.
Additional information
For additional information regarding the creation of the BED dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.