Published September 26, 2024 | Version v1
Conference proceeding Open

SURE: A New Privacy and Utility Assessment Library for Synthetic Data

  • 1. Clearbox AI

Contributors

Contact person:

  • 1. Clearbox AI

Description

This paper introduces SURE, a comprehensive open-source library designed to assess the privacy risks and utility trade-offs of synthetic datasets. SURE addresses critical privacy concerns associated with synthetic data, such as the potential for individual identification through techniques like membership inference attacks. By offering a user-centric framework for evaluating both the statistical properties and privacy guarantees of synthetic data, SURE aims to balance the privacy-utility conundrum that often affects data anonymization efforts. The library provides robust tools for data scientists and compliance officers to ensure that synthetic datasets preserve both utility and privacy for effective AI training and data analysis, while adhering to GDPR and other regulatory standards. SURE's functionalities include statistical similarity tests, machine learning utility evaluations and privacy risk assessments, all of which are accessible through a user-friendly Python interface. Following extensive testing and validation, the finalized version of SURE will be available open-source.

Abstract

This post-print has been published as the following version of record:

D. Brunelli, S. Kurapati and L. Gilli, "SURE: A New Privacy and Utility Assessment Library for Synthetic Data," in 2024 IEEE International Conference on Blockchain (Blockchain), Copenhagen, Denmark, 2024 pp. 643-648.
doi: 10.1109/Blockchain62396.2024.00094

Files

SURE A New Privacy and Utility Assessment Library for Synthetic Data.pdf

Files (233.6 kB)

Additional details

Funding

European Commission
TrustChain – Fostering a Human-Centered, Trustworthy and Sustainable Internet 101093274