The key data sources used in the article are described and cited in the article. All source data is confidential, available on restricted-access servers.
A 50% sample of the BHP is accessible to external researchers through the FDZ (Research Data Center) of the IAB. Applicants must fill out a form, subject to approval, and can access the data via the various access mechanisms of the FDZ, including physical locations in Germany, elsewhere in Europe, and North America.
Synthetic data from both confidential data were never released to the public, and are accessible only via the same access mechanisms as above. A related synthetic LEAP was made available through the Canadian Research Data Center system, as part of a pilot program, to prepare access to the confidential data. We are not aware of current access. The outcomes of the pilot program have not been made public yet.
As a small part of the post-processing, we count the (theoretical) number of Canadian NAICS industry groups (Statistics Canada, 2012). The file can be downloaded from https://www.statcan.gc.ca/eng/subjects/standard/naics/2012/index.
## Parsed with column specification:
## cols(
## Level = col_double(),
## `Hierarchical structure` = col_character(),
## Code = col_character(),
## `Class title` = col_character(),
## Superscript = col_character(),
## `Class definition` = col_character()
## )
Level | Hierarchical structure | Code | Class title | Superscript | Class definition | |
---|---|---|---|---|---|---|
Min. :1.000 | Length:2078 | Length:2078 | Length:2078 | Length:2078 | Length:2078 | |
1st Qu.:4.000 | Class :character | Class :character | Class :character | Class :character | Class :character | |
Median :4.000 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | |
Mean :4.161 | NA | NA | NA | NA | NA | |
3rd Qu.:5.000 | NA | NA | NA | NA | NA | |
Max. :5.000 | NA | NA | NA | NA | NA |
The analytic outcomes described in the article were released through the respective disclosure avoidance mechanisms, subject to disclosure avoidance procedures of each statistical institution. These outcomes, as figures, CSV files, and others, are available in this repository. Some were extracted from figures or released tables by the programs in this directory.
The data directory contains materials released from Statistics Canada and the IAB. It is mostly highly aggregated synthetic data, as well as regression coefficients. All data releases were authorized by the respective statistical agencies.
The graphs contains mostly pre-rendered figures released as part of the agency data releases. The programs to generate these figures can be found in programs/Canada, and were run on the confidential data.
The graphs contains GPH (Stata) format files, the source for the PDFs in the graphs directory. The programs to generate these figures can be found in programs/Canada, and were run on the confidential data.
Programs for analysis (programs/Canada, used for both Canada and Germany), and post-processing (programs/Post) are provided.
Graphs generated through post-processing (programs/Post) are available in r-graphs.
Tables generated both by tabulation of confidential data (programs/Canada, used for both Canada and Germany), and post-processing (programs/Post) can be found in the tables directory.
The software used to generate the synthetic data is described in Kinney et al (2011b). A copy of the code can be obtained by request.
The raw synthetic and confidential data served as input to the various analyses described in the paper. These analyses occurred within the secure environments of the respective agencies. The code for the analysis is common to both countries (with minor adjustments to account for different variable names). The code used in the Canadian context is provided as a single Stata file in the [programs/Canada](programs/Canada)
directory.
The following programs are used to post-process the analytic results:
Numbered programs should be executed in the natural order. Other programs define locations and/or subroutines, and should not be executed. A convenience bash script run_all.sh
is provided.
Vilhuber acknowledges funding through NSF Grants SES-1131848 and SES-1042181, and a grant from Alfred P. Sloan Grant (G-2015-13903). Alam and Dostie acknowledge funding through SSHRC Partnership Grant “Productivity, Firms and Incomes”. The creation of the Synthetic LBD was funded by NSF Grant SES-0427889.
These data are licensed under a Creative Commons Attribution-NonCommercial 4.0 International license. See [citation] for attribution.