Published March 27, 2024 | Version v2
Dataset Open

TERMINET eHealth post-operation complications synthetic dataset

Description

1. Introduction

Older adults with cancer often need to undergo operations. Post-surgery complications may arise, and Real-World Data (RWD) collected from such patients during a pre-operation monitoring period of two weeks can help identify risk for post-surgery complications. The involved RWD span behavioral data (measured or reported) as well as clinical data (collected during clinical tests). This dataset is synthesized by Innovation Sprint, using actual data collected from eligible Fondazione Policlinico Gemelli patients participating to the SUPERO study. The clinical data is collected by the hospital, while the behavioral is collected using Healthentia, a medical decision support software developed by Innovation Sprint, facilitating the collection, analysis and presentation of behavioral data.

2. Dataset description

The provided TERMINET eHealth post-operation complications synthetic dataset contains 10,000 synthetic patients, provided in an equal number of rows in the CSV file containing the dataset. The different attributes of the dataset are organized in columns.

The attributes are summarized as follows:

  • 6 columns of step data statistics
  • 20 columns of clinical attributes
  • 2 columns of demographics attributes
  • 12 columns of questionnaire attributes
  • 1 column of outcome attribute

2.1. Step statistics

Step data is collected per day of the pre-hospitalization period. The final two weeks of that period are used to derive the step statistics. For each of the weeks, the mean, standard deviation and slope of the linear regression of the step data is reported, 3 attributes per week, 6 attributes in total.

2.2. Clinical attributes

The 20 clinical attributes collected at the hospital are ALT, Hematocrit (%), AST, Lymphocytes, Hepatitis B, Neutrophils (%), Hepatitis C, Neutrophils, INR (%), INR, White Blood Cells, INR (seconds), Platelets, Sodium, Hemoglobin, Potassium, Lymphocytes (%), Creatinine, Bilirubin and Urea Nitrogen.

2.3. Demographic attributes

The sex and age are the two demographic attributes collected.

2.4. Questionnaire attributes

Three questionnaires are involved in the SUPERO study are:

  • G8, spanning the categories of food intake, weight loss, movement, neuropsychological, BMI, multiple medication, health and age.
  • SPPB, spanning the categories of balance, speed and strength
  • MiniCog, where only the clock drawing capabilities are assessed

2.5. Outcome attribute

The single outcome attribute is the existence of any post-surgery complications. Please note that the dataset is quite imbalanced, since complications are very rare.

3. Data synthesis

This dataset is synthesized from the early data of the SUPERO study. Currently there are 21 patients registered, with the decision to operate them being reached for 20 of them. 16 of the patients have already been operated, 2 of them having exhibited post-surgery complications. More vectors have been generated by adding Gaussian noise to the original 16 vectors, resulting to 128 vectors. The resulting vectors have been clustered into 16 clusters using Agglomerative clustering. Every cluster has been modelled via Gaussian Mixture Models. The resulting set of GMMs has been used to generate the 10,000 synthetic vectors of the dataset.

4. Acknowledgement

The development of this dataset has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957406 (TERMINET).

Files

TERMINET_UC2_synthetic_data_202403.csv

Files (3.1 MB)

Name Size Download all
md5:7fb0a11ad9149c82c59512b79911ba1c
3.1 MB Preview Download

Additional details

Funding

TERMINET – nexT gEneRation sMart INterconnectEd ioT 957406
European Commission