Published December 7, 2022 | Version v1
Dataset Open

Processed Synthetic Real-World Data for binary modelling

  • 1. Innovation Sprint

Description

This model learning dataset is created out of the Raw Synthetic RWD raw dataset, including some of the original attributes. It is distributed in JOBLIB files, where .joblib files contain the vectors and _ids.joblib contain the ID of the person from which each vector is extracted.

This is useful in case it is needed to map the vectors to metadata about the people that are found in the original raw dataset. Note that corresponds to , or , depending on the dataset. The split is roughly 60% of the people are in the training dataset, and 20% in each of the validation and the testing datasets. The input attributes are the age, the short-term averages and the trends of the current week’s BMI, steps walked, calories burned, sleep quality, mood and water consumption, as well as the previous week’s short-term average and trend of the answer to the health self-assessment question.

The outcome to be predicted is the binary quantized health self-assessment answer to be given in the current week. The dataset is normalized based on the training set. The means and standard deviations used can be found in the train_statistics.joblib file. Finally, the output_descriptions.joblib file contains descriptions of the outcomes to be predicted (not actually needed, since included here).

Files

Processed_RWD_binary_modelling_20211120.zip

Files (11.0 MB)

Name Size Download all
md5:ea5f0e83acbe662c3c589f63b3764a2a
11.0 MB Preview Download

Additional details

Funding

European Commission
INFINITECH – Tailored IoT & BigData Sandboxes and Testbeds for Smart, Autonomous and Personalized Services in the European Finance and Insurance Services Ecosystem 856632

References

  • Pnevmatikakis, Aristodemos, Stathis Kanavos, George Matikas, Konstantina Kostopoulou, Alfredo Cesario, and Sofoklis Kyriazakos. 2021. "Risk Assessment for Personalized Health Insurance Based on Real-World Data" Risks 9, no. 3: 46. https://doi.org/10.3390/risks9030046