Planned intervention: On Wednesday June 26th 05:30 UTC Zenodo will be unavailable for 10-20 minutes to perform a storage cluster upgrade.
Published December 7, 2020 | Version 1.0
Dataset Open

Smartphone sensor data (accelerometer, virtual keyboard) collected in-the-wild by Parkinson's Disease patients and Healthy Controls

  • 1. Department of Electrical & Computer Engineering Aristotle University of Thessaloniki
  • 2. Department of Neurology, Technical University of Dresden, Dresden, Germany
  • 3. Third Neurological Clinic, Papanikolaou Hospital, Thessaloniki, Greece
  • 4. International Parkinson Excellence Research Centre, King's College Hospital NHS Foundation Trust, London, United Kingdom
  • 5. Department of Electrical Engineering and Computer Science/Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, UAE


For detailed description of the dataset see the relevant journal article.

Python code for model inference and training is available here.



The dataset contains accelerometer recodings and keyboard typing data contributed by Parkinson's Disease patients and Healthy Controls. Accelerometer data consists of acceleration values recorded during phone calls and typing data consist of virtual keyboard press and release timestamps. The dataset is divided into two parts: the first part, called SData, contains data from a small, medically evaluated, set of users, while the second part, called GData, contains recordings from a large body of users with self-reported PD labels.

The dataset is organized into 5 pickle files:

1. imu_sdata.pickle: Contains the tri-axial accelerometer recordings for the SData part of the dataset in the form of a list of python dictionaries, one for each participating subject. Accelerometer data have been pre-processed to a sampling frequency of 100Hz and come segmented into non-overlapping 5 second windows. Hence, a segment's dimension will be 500 x 3 samples.

Sample Python code for accessing the acceleration data of a subject

sdata = pickle.load(open('imu_sdata.pickle', 'rb'))
subject_list = list(sdata.keys())

## Data for first subject
subject_data = sdata[subject_list[0]]  # subject_data is a list of length 4

## The actual data is in the last element of the list
acc_segments = subject_data[-1]
num_acc_sessions_for_subject = len(acc_segments)

acc_segments_for_first_session = acc_segments[0]
acc_segments_for_second_session = acc_segments[1]
# ..etc

In: print(acc_segments_for_first_session.shape)
Out: (3, 500, 3)
## The first accelerometer session for this subject consists of 3 five-second segments.

In: print(acc_segments_for_second_session.shape)
Out: (8, 500, 3)

## The second accelerometer session for this subject consists of 8 five-second segments.

2. imu_gdata.pickle: Same layout as imu_sdata.pickle but with data ffrom GData subjects.

3. typing_sdata.pickle: This files contains the typing data originating from the SData part of the dataset. It is a list of dictionaries with one entry per subject. The typing data are given in the form of concatenated hold time (the time elapsed between press and release of the virtual key) and flight time (the time between releasing a key and press the next) histograms, computed over 10ms bins in the range of [0, 1]s for hold time and [0, 4]s for flight time (an additional bin that contains the values in the (1, +oo) and (4, +oo) intervals is also used). So, the total length of the concatenated histogram is 1000/10 + 1 + 4000/10 + 1 = 502.

Sample Python code for accessing the typing data of a subject:

sdata = pickle.load(open('typing_sdata.pickle', 'rb'))
subject_list = list(sdata.keys())

## Data for first subject
subject_data = sdata[subject_list[0]]

## The actual data is in the first element of the list
typing_histograms = subject_data[0]
num_typing_sessions_for_subject = len(typing_histograms)

typing_hist_for_first_session = typing_histograms[0]
typing_hist_for_second_session = typing_histograms[1]
# ..etc

In: print(typing_hist_for_first_session.shape)
Out: (502, )

ht_hist = typing_hist_for_first_session[:101] # Hold time histogram of the session
ft_hist = typing_hist_for_first_session[101:] # Flight time histogram of the session

4. typing_gdata.pickle: Same layout as typing_sdata.pickle but with data from GData subjects.

5. subject_metadata.pickle: A list of dictionaries with one entry per subject containing demographic information. The relevant demographic fields have the following interpretation:
 'age': Year of birth,
 'gender_id': 0 indicates male, 1 indicates female
 'healthstatus_id': 0 indicates PD patient, 1 indicates Healthy with PD family history, 2 indicates Healthy without PD family history

In the case of SData subjects, there is also symptom UPDRS scores from one or two medical examinations. These are ncoded in the fields med_eval_1 and med_eval_2.



The study during which the present dataset was collected is a multi-center study approved in each country available (for more info visit: Informed consent, including permission for third-party access to pseudo-anonymised data, was obtained from all subjects prior to their engagement with the study. The work has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 690494 - i-PROGNOSIS: Intelligent Parkinson early detection guiding novel supportive interventions (



Any inquiries regarding this dataset should be adressed to:

Mr. Alexandros Papadopoulos (Electrical & Computer Engineer, PhD candidate)

Multimedia Understanding Groupmug
Department of Electrical & Computer Engineering
Aristotle University of Thessaloniki
University Campus, Building C, 3rd floor
Thessaloniki, Greece, GR54124

Tel: +30 2310 996359, 996365 
Fax: +30 2310 996398





Files (6.5 GB)

Name Size Download all
6.5 GB Preview Download

Additional details

Related works

Dataset: 10.5281/zenodo.2571623 (DOI)
Is supplement to
Journal article: 10.1038/s41598-020-78418-8 (DOI)
Dataset: 10.5281/zenodo.3519213 (DOI)


i-PROGNOSIS – Intelligent Parkinson eaRly detectiOn Guiding NOvel Supportive InterventionS 690494
European Commission


  • Papadopoulos, A., Iakovakis, D., Klingelhoefer, L. et al. Unobtrusive detection of Parkinson's disease from multi-modal and in-the-wild sensor data using deep learning techniques. Sci Rep 10, 21370 (2020).