Published August 4, 2025
| Version v1
Conference paper
Open
Collaborative Data Anonymization Process
- 1. RWTH Aachen University
- 2. RWTH Aachen University; Fraunhofer FIT
Contributors
Editors:
- 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
- 2. University of Amsterdam
Description
Data anonymization is an important component of data privacy, allowing organizations to use data while protecting sensitive information . For instance, with the widespread adoption of smart meters and Internet of Things (IoT) devices, a large amount of data related to personal information, such as electricity usage patterns, geographic locations, and identification information are collected in the energy related industry. Due to privacy protection regulations and business interests, it is necessary to anonymize these data during data analysis and exchange. Additionally, there are some challenges for data in research, e.g. ethical and security concerns [1]. Although many methods and tools for data anonymization [2] [3] have been developed, there is still a lack of clarity on how to connect tools together in a whole anonymization process and how to effectively collaborate with data providers. To address these problems, this paper proposes a framework for the collaborative data anonymization process by using opensource tools. The framework presented in Figure 1 consists of the following steps: Firstly, data e.g. energy consumption, user information is collected from industry partners. Sensitive data discovery can be automated through the use of open-source tools, e.g. [4]. At the same time, metadata can be obtained from industrial partners, such as whether the data has already been anonymized by other collaborators. Based on the results of the automated processing and metadata from the industrial partners, a determination is made regarding whether the data needs to be anonymized. However, discrepancies in these results may arise. If there is disagreement between the two results, the administrator will conduct a review to evaluate whether anonymization of the data is necessary. After data anonymization by [5], verifying the integrity [6] of this process becomes crucial to ensure that re-identification of individuals cannot occur. Following this verification step, the next phase involves data contribution. This step aims to organizing and preparing the anonymized format and structure. After this preparation, the final step involves uploading the data to the energy data sharing platform in accordance with the FAIR principles, which ensure shared data can be easily located, accessed by authorized users, integrated with other datasets, and reused for various research purposes [7]. In summary, this process provides a toolchain to facilitate the identification of sensitive data. Additionally, it incorporates the collection of metadata and verifying anonymisation integrity before uploading to the energy data sharing platform. This framework fosters a collaborative environment for industry partners, thereby facilitating their active engagement in the data anonymization process. In future work, it will be essential to evaluate various tools that can be utilized in this framework, while considering requirements of industry partners. By taking these factors into account, we aim to enhance the framework to ensure it effectively addresses the diverse needs of all stakeholders involved.
Files
CoRDI_2025_paper_115.pdf
Files
(209.9 kB)
Name | Size | Download all |
---|---|---|
md5:9b8d325106cfea94939ddf9fa65f2c85
|
209.9 kB | Preview Download |