Published May 13, 2019 | Version v1
Presentation Open

Differential Privacy: What's all the noise about?

  • 1. Privitar
  • 2. Georgetown

Description

The public is becoming increasingly concerned with how their sensitive data is handled. Many sensitive datasets are useful for evidence-based policy making, but before data can be used to inform policy decisions, privacy concerns must be addressed. There is a tension between (a) the desire to share data widely (publicly, even) in order to get the most insight out of it and (b) the desire to keep to protect the privacy of the data subjects by keeping the data secured and locked down.

 

Sharing data about groups--in the form of aggregate statistics like counts, sums and averages--is widely considered to be a good balance of privacy and utility. Aggregate statistics are commonly shared by statistical agencies, such as the Office for National Statistics, and other government organisations, such as the Department for Work and Pensions and the Department for Education. Aggregate statistics are also common in non-profits and private companies conducting research or business intelligence analytics.

 

However, new theoretical and empirical knowledge suggests that aggregation alone is insufficient to protect privacy and that the traditional disclosure control methods used to further protect aggregates are, in many cases, also insufficient. A solution is needed to continue to reap the benefits of sharing aggregate statistics while preserving individual privacy and, furthermore, to preserve privacy in a transparent and accountable way. One way forward may lie in a new field of research called differential privacy.

 

It’s the discovery of a new, serious type of attack called a reconstruction attack that’s accelerated the need for a new approach to privacy. Reconstruction attacks use the information in all of the statistics released from a dataset in order to recover the underlying source data in an approach similar to solving a system of equations.

 

Once considered only a theoretical risk, real-world reconstruction attacks have now been conducted in practice on US Census data by the US Census Bureau and independently by a New York Times journalist [1]. The US Census Bureau recently reported “serious vulnerabilities” to reconstruction attacks in its 2000 and 2010 Census data releases [2].

 

John Abowd, the US Census Bureau’s chief scientist and associate director of research and methodology, described reconstruction attacks as “the death knell for traditional data publications systems” [3]. And a recent paper co-authored by Abowd concludes: “The vast quantity of data products published by statistical agencies each year may give a determined attacker more than enough information to reconstruct some or all of a target database and breach the privacy of millions of people” [4].

 

In response to the threat of reconstruction attacks, the US Census has decided to use differential privacy for the 2020 Census.

 

Differential privacy accounts for the fact that any release of statistics about sensitive datasets carries some element of risk – it’s the ability to quantify and manage that risk that’s important.

 

It provides a mathematical guarantee to individuals that their privacy risk (how much information specific to them is revealed) is limited. By limiting the information revealed about individuals, differential privacy can ensure that attacks such as reconstruction fail. Differential privacy is a guarantee, not an algorithm, but it comes with a set of algorithms for tasks like releasing aggregate statistics, creating synthetic data, and training machine learning models. These algorithms typically rely on introducing a small, controlled amount of noise into the statistics or models before releasing them.

 

Differential privacy has several benefits, including the ability to:

  • Gain insights from data which would otherwise be too sensitive to be used

  • Quantify the level of privacy and utility for each use case, informing decision-making about the benefits and costs of data-sharing

  • Defend against even the most sophisticated privacy attacks, including reconstruction attacks

  • Provide future-proof protection, with privacy-preserving methods that make no assumptions about attack strategies

  • Quantify privacy risk across multiple statistical releases

  • Disclose algorithms and parameters without risk, for complete transparency

 

In practical terms, differential privacy is an emerging technology and is not trivial to deploy today. Nevertheless, it is important to understand and consider using, because it allows organizations to address existing vulnerabilities to new threats and prepare with confidence for the privacy challenges of the future. We recommend that organizations:

  • Gauge the risk of reconstruction attacks, and other state-of-the-art privacy attacks, in existing statistical releases.

  • Identify the right use cases for piloting differential privacy. Lower-sensitivity datasets can be a good place to start while gaining experience with differential privacy.

  • Engage with policymakers, legal scholars, differential privacy researchers, and other relevant stakeholders to discuss appropriate levels of privacy protection.

  • Strengthen relationships with differential privacy communities in academia and industry, to influence research in directions that are relevant for the organisation, and help bridge the gap between theory and practice.

 

This work was conducted by Privitar and Professor Kobbi Nissim on behalf of the Government Statistical Service (GSS) and served as a chapter in the National Statistician’s Quality Review into Privacy and Data Confidentiality Methods

[5].


 

References

 

[1] Hansen, Mark. “To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data.” The New York Times. https://www.nytimes.com/2018/12/05/upshot/to-reduce-privacy-risks-the-census-plans-to-report-less-accurate-data.html

[2] Abowd, John M. "The US Census Bureau Adopts Differential Privacy." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2018. https://dl.acm.org/citation.cfm?doid=3219819.3226070

[3] “Staring-Down the Database Reconstruction Theorem” https://www.census.gov/content/dam/Census/newsroom/press-kits/2018/jsm/jsm-presentation-database-reconstruction.pdf

[4] Garfinkel, Simson L., John M. Abowd, and Christian Martindale. "Understanding Database Reconstruction Attacks on Public Data." (2018).

https://digitalcommons.ilr.cornell.edu/cgi/viewcontent.cgi?article=1051&context=ldi

[5] “National Statistician’s Quality Review into Privacy and Data Confidentiality Methods” https://gss.civilservice.gov.uk/guidances/quality/nsqr/privacy-and-data-confidentiality-methods-a-national-statisticians-quality-review/

Files

Files (4.2 MB)

Name Size Download all
md5:9d7e97c6abfa0d7c0372e8e1ff060f6f
4.2 MB Download