Published June 15, 2021 | Version Version V1
Dataset Open

CINECA synthetic cohort Africa H3ABioNet v1

  • 1. H3ABioNet, University of Cape Town

Description

This dataset consists of 100 samples that have synthetic subject attributes and 47 phenotypic data based on the Human Heredity and Health in Africa (H3Africa) consortium core phenotype model (H3Africa Core phenotype). The genetic data consists of 100 samples of African ancestries randomly selected the 2504 samples of 1000 Genomes project phase 3 data spanning randomly selected 650K variants in chromosome 1. This synthetic dataset was developed as part of the CINECA project to increase accessibility to cohort data for standards development, whilst mitigating ethical and legal privacy concerns that arise with cohort data sharing, including pseudonymised data. This dataset should not be used to make any inference whatsoever as the values of the fields do not entirely reflect reality.

Notes

- The metadata conforms to the structure and schema of the H3Africa Core phenotype, but it is otherwise nonsensical: no checks have been implemented across fields, and values may be completely unrealistic. We did not model any correlation between fields. There is, however, a plan to model a correlation on a few variables such as weight and height. This dataset should not be used to make any inference whatsoever as the values of the fields do not entirely reflect reality. Dates randomly generated are between 1910 and 1990 to avoid confusion with real data. This synthetic data set (with cohort "participants" / "subjects" marked with fake) has no identifiable data and cannot be used to make any inference about H3Africa cohort data or results. This dataset aims to aid the development of technical implementations for cohort data discovery, harmonisation, access, and federated analysis. In support of FAIRness in data sharing, this dataset is made freely available under the Creative Commons Licence (CC-BY). Please ensure this preamble is included with this dataset and that the H3Africa project and the CINECA project (funding: EC H2020 grant 825775 and CIHR grant 404896) are acknowledged. If you have any questions about this dataset, please contact Mamana Mbiyavanga (mamana.mbiyavanga@uct.ac.za) or Nicola Mulder (nicola.mulder@uct.ac.za).

Files

CINECA_synthetic_cohort_Africa_H3ABioNet_v1.zip

Files (11.8 MB)

Name Size Download all
md5:7d27e010190eda216b07b3802e9df859
11.8 MB Preview Download

Additional details

Funding

European Commission
CINECA - Common Infrastructure for National Cohorts in Europe, Canada, and Africa 825775