Applying Data Synthesis for Longitudinal Business Data across Three Countries
- 1. Truman State University
- 2. HEC Montréal
- 3. Institute for Employment Research
- 4. Cornell University
Description
Data on businesses collected by statistical agencies are challenging to protect.Many businesses have unique characteristics, and distributions of employment,sales, and profits are highly skewed. Attackers wishing to conduct identificationattacks often have access to much more information than for any individual. Asa consequence, most disclosure avoidance mechanisms fail to strike an accept-able balance between usefulness and confidentiality protection. Detailed aggregatestatistics by geography or detailed industry classes are rare, public-use microdataon businesses are virtually inexistant, and access to confidential microdata can beburdensome. Synthetic microdata have been proposed as a secure mechanism topublish microdata, as part of a broader discussion of how to provide broader accessto such datasets to researchers. In this article, we document an experiment to cre-ate analytically valid synthetic data, using the exact same model and methods previ-ously employed for the United States, for data from two different countries: Canada(Longitudinal Employment Analysis Program (LEAP)) and Germany (EstablishmentHistory Panel (BHP)). We assess utility and protection, and provide an assessmentof the feasibility of extending such an approach in a cost-effective way to other data.
Notes
Files
AlamDostieDrechslerVilhuber-online-appendix.pdf
Additional details
Related works
Funding
- Synthetic Data User Testing and Dissemination 1042181
- National Science Foundation
- ITR-(ECS+ASE)-(dmc+int): Info Tech Challenges for Secure Access to Confidential Social Science Data 0427889
- National Science Foundation
- NCRN-MN: Cornell Census-NSF Research Node: Integrated Research Support, Training and Data Documentation 1131848
- National Science Foundation