Dataset Open Access
Alexandros Papadopoulos; Dimitrios Iakovakis; Lisa Klingelhoefer; Sevasti Bostantjopoulou; Kallol Ray Chaudhuri; Konstantinos Kyritsis; Stelios Hadjidimitriou; Vasileios Charisis; Leontios J. Hadjileontiadis; Anastasios Delopoulos
For detailed description of the dataset see the relevant journal article.
Python code for model inference and training is available here.
The dataset contains accelerometer recodings and keyboard typing data contributed by Parkinson's Disease patients and Healthy Controls. Accelerometer data consists of acceleration values recorded during phone calls and typing data consist of virtual keyboard press and release timestamps. The dataset is divided into two parts: the first part, called SData, contains data from a small, medically evaluated, set of users, while the second part, called GData, contains recordings from a large body of users with self-reported PD labels.
The dataset is organized into 5 pickle files:
1. imu_sdata.pickle: Contains the tri-axial accelerometer recordings for the SData part of the dataset in the form of a list of python dictionaries, one for each participating subject. Accelerometer data have been pre-processed to a sampling frequency of 100Hz and come segmented into non-overlapping 5 second windows. Hence, a segment's dimension will be 500 x 3 samples.
Sample Python code for accessing the acceleration data of a subject
sdata = pickle.load(open('imu_sdata.pickle', 'rb')) subject_list = list(sdata.keys()) ## Data for first subject subject_data = sdata[subject_list] # subject_data is a list of length 4 ## The actual data is in the last element of the list acc_segments = subject_data[-1] num_acc_sessions_for_subject = len(acc_segments) acc_segments_for_first_session = acc_segments acc_segments_for_second_session = acc_segments # ..etc In: print(acc_segments_for_first_session.shape) Out: (3, 500, 3) ## The first accelerometer session for this subject consists of 3 five-second segments. In: print(acc_segments_for_second_session.shape) Out: (8, 500, 3) ## The second accelerometer session for this subject consists of 8 five-second segments.
2. imu_gdata.pickle: Same layout as imu_sdata.pickle but with data ffrom GData subjects.
3. typing_sdata.pickle: This files contains the typing data originating from the SData part of the dataset. It is a list of dictionaries with one entry per subject. The typing data are given in the form of concatenated hold time (the time elapsed between press and release of the virtual key) and flight time (the time between releasing a key and press the next) histograms, computed over 10ms bins in the range of [0, 1]s for hold time and [0, 4]s for flight time (an additional bin that contains the values in the (1, +oo) and (4, +oo) intervals is also used). So, the total length of the concatenated histogram is 1000/10 + 1 + 4000/10 + 1 = 502.
Sample Python code for accessing the typing data of a subject:
sdata = pickle.load(open('typing_sdata.pickle', 'rb')) subject_list = list(sdata.keys()) ## Data for first subject subject_data = sdata[subject_list] ## The actual data is in the first element of the list typing_histograms = subject_data num_typing_sessions_for_subject = len(typing_histograms) typing_hist_for_first_session = typing_histograms typing_hist_for_second_session = typing_histograms # ..etc In: print(typing_hist_for_first_session.shape) Out: (502, ) ht_hist = typing_hist_for_first_session[:101] # Hold time histogram of the session ft_hist = typing_hist_for_first_session[101:] # Flight time histogram of the session
4. typing_gdata.pickle: Same layout as typing_sdata.pickle but with data from GData subjects.
5. subject_metadata.pickle: A list of dictionaries with one entry per subject containing demographic information. The relevant demographic fields have the following interpretation:
'age': Year of birth,
'gender_id': 0 indicates male, 1 indicates female
'healthstatus_id': 0 indicates PD patient, 1 indicates Healthy with PD family history, 2 indicates Healthy without PD family history
In the case of SData subjects, there is also symptom UPDRS scores from one or two medical examinations. These are ncoded in the fields med_eval_1 and med_eval_2.
ETHICS & FUNDING
The study during which the present dataset was collected is a multi-center study approved in each country available (for more info visit: http://www.i-prognosis.eu/?page_id=3606). Informed consent, including permission for third-party access to pseudo-anonymised data, was obtained from all subjects prior to their engagement with the study. The work has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 690494 - i-PROGNOSIS: Intelligent Parkinson early detection guiding novel supportive interventions (i-prognosis.eu).
Any inquiries regarding this dataset should be adressed to:
Mr. Alexandros Papadopoulos (Electrical & Computer Engineer, PhD candidate)
Multimedia Understanding Groupmug
Department of Electrical & Computer Engineering
Aristotle University of Thessaloniki
University Campus, Building C, 3rd floor
Thessaloniki, Greece, GR54124
Tel: +30 2310 996359, 996365
Fax: +30 2310 996398
Papadopoulos, A., Iakovakis, D., Klingelhoefer, L. et al. Unobtrusive detection of Parkinson's disease from multi-modal and in-the-wild sensor data using deep learning techniques. Sci Rep 10, 21370 (2020). https://doi.org/10.1038/s41598-020-78418-8