3785744
doi
10.5281/zenodo.3785744
oai:zenodo.org:3785744
user-labordynamicsinstitute
Dostie, Benoit
HEC Montréal
Drechsler, Jörg
Institute for Employment Research
Vilhuber, Lars
Cornell University
Applying Data Synthesis for Longitudinal Business Data across Three Countries
Alam, M. Jahangir
Truman State University
url:https://github.com/labordynamicsinstitute/SyntheticLEAP/releases/tag/v20200504
url:https://productivitypartnership.ca/sites/default/files/documents/wps_-_data_synthesis_for_longitudinal_data_alam_dostie_drechsler_vilhuber_0.pdf
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial 4.0 International
https://creativecommons.org/licenses/by-nc/4.0/legalcode
business data
confidentiality
LBD
LEAP
BHP
synthetic data
<p>Data on businesses collected by statistical agencies are challenging to protect.Many businesses have unique characteristics, and distributions of employment,sales, and profits are highly skewed. Attackers wishing to conduct identificationattacks often have access to much more information than for any individual. Asa consequence, most disclosure avoidance mechanisms fail to strike an accept-able balance between usefulness and confidentiality protection. Detailed aggregatestatistics by geography or detailed industry classes are rare, public-use microdataon businesses are virtually inexistant, and access to confidential microdata can beburdensome. Synthetic microdata have been proposed as a secure mechanism topublish microdata, as part of a broader discussion of how to provide broader accessto such datasets to researchers. In this article, we document an experiment to cre-ate analytically valid synthetic data, using the exact same model and methods previ-ously employed for the United States, for data from two different countries: Canada(Longitudinal Employment Analysis Program (LEAP)) and Germany (EstablishmentHistory Panel (BHP)). We assess utility and protection, and provide an assessmentof the feasibility of extending such an approach in a cost-effective way to other data.</p>
The opinions expressed here are those of the authors, and do not reflect the opinions of any of the statistical agencies involved. All results were reviewed for disclosure risks by their respective custodians, and released to the authors. Alam thanks Claudiu Motoc and Danny Leung for help with the Canadian data. Vilhuber acknowledges funding through NSF Grants SES-1131848 and SES-1042181, and a grant from Alfred P. Sloan Grant (G-2015-13903). Alam and Dostie acknowledge funding through SSHRC Partnership Grant ``Productivity, Firms and Incomes''. The creation of the Synthetic LBD was funded by NSF Grant SES-0427889.
Zenodo
2020-05-05
info:eu-repo/semantics/other
3785743
user-labordynamicsinstitute
v20200504
award_title=Synthetic Data User Testing and Dissemination; award_number=1042181; funder_id=021nxhr62; funder_name=National Science Foundation;
award_title=ITR-(ECS+ASE)-(dmc+int): Info Tech Challenges for Secure Access to Confidential Social Science Data; award_number=0427889; funder_id=021nxhr62; funder_name=National Science Foundation;
award_title=NCRN-MN: Cornell Census-NSF Research Node: Integrated Research Support, Training and Data Documentation; award_number=1131848; funder_id=021nxhr62; funder_name=National Science Foundation;
1673277088.856139
7564924
md5:ce857b8bb476997f4b2eddc77cd26010
https://zenodo.org/records/3785744/files/SyntheticLEAP-20200504.zip
public
https://github.com/labordynamicsinstitute/SyntheticLEAP/releases/tag/v20200504
Is part of
url
https://productivitypartnership.ca/sites/default/files/documents/wps_-_data_synthesis_for_longitudinal_data_alam_dostie_drechsler_vilhuber_0.pdf
Is source of
url
10.5281/zenodo.3785743
isVersionOf
doi