15. Rare diseases: making environmental health studies' data as open as possible
Creators
- 1. 1ADAPT Centre for Digital Content, Trinity College Dublin, Dublin, Ireland
- 2. 2The European Institute for Innovation Through Health Data (i-HD), Ghent, Belgium,
- 3. 3Department of Rheumatology, Kantonsspital St.Gallen, St. Gallen, Switzerland
Description
Background: Researchers are confronted with increased data protection risks when trying to publish in an open manner data related to their studies into environmental factors related to rare diseases, even if the data is pseudonymised. Identification risks exist for the patients being studied, in terms of singling out an individual, data linking with other sources, or inferencing certain data from the linked data. In addition, effective anonymization methods cannot be applied without losing the value of the data for research with low sample sizes, as in rare diseases. For example, permuting the environmental observations would affect the temporality of the data or introducing noise would affect the magnitude of the values. Both methods can potentially hide the signal that the researchers are looking for.
Methods: The approach we recommend is to publish example patient event-environmental linked data and its associated metadata, which could be shared as Open Data following the Findable Accessible Interoperability and Reusability (FAIR) guiding principles. In our approach, we recommend the example data and metadata are described using the Resource Description Framework (RDF), a standard graph data model, following W3C standards and recommendations for statistical data (RDF Data Cube), dataset descriptors (DCAT), provenance and lineage (PROV-O); and data protection domains (DPV). We then recommend that the data and associated metadata is published in an open repository preferred by your community generating a unique Digital Object Identifier (DOI) for the dataset.
Results: An example of a dataset that has been published in an open manner according to our proposed approach can be found at the following DOI: https://doi.org/10.5281/zenodo.5544257. The dataset is an example result of associating air pollution and weather data subsets to particular health events within a region in the Republic of Ireland, together with the relevant metadata fields. The implementation of the FAIR guiding principles was applied on the necessary metadata and a Data Protection Impact Assessment, in accordance with the GDPR, was created, therefore ensuring good data protection practices were met to assess the necessary risks that could arise.
Conclusions: The approach proposed would facilitate data publication as open as possible for researchers studying rare disease environmental risk factors. Furthermore, the data pre-processing step will be recorded in the metadata (i.e., a transparent record) enhancing the data re-use of researchers in the community and stakeholders.
Disclosures: None.
Files
Files
(15.6 kB)
Name | Size | Download all |
---|---|---|
md5:f62d0ff832c591dc1135662303d2232e
|
15.6 kB | Download |