Data Availability Statement

The key data sources used in the article are described and cited in the article. All source data is confidential, available on restricted-access servers.

LEAP

  • Confidential LEAP: Access to the LEAP can be requested at Statistics Canada/ CDER (https://www.statcan.gc.ca/eng/cder/index).
  • Synthetic LEAP: Release of the data was not requested; data are thus still considered to be confidential. The data were generated on internal servers, and accessed by one of the authors during his stay at Statistics Canada. None of the authors currently have access to the data. Data may be accessible through the CDER access process outlined above. The Canadian LEAP is accessible at CRDE.
  • Applicants need to be Canadian residents, and a security check is conducted. Access is only at Statistics Canada offices in Ottawa, ON, Canada.

BHP

  • Access to the BHP (in English: Establishment History Panel) is possible through the Research Data Center of the IAB (https://fdz.iab.de/en.aspx). One of the authors, as an IAB employee, had access to the internal version of the data not available to researchers. Release of the data was not requested; data are thus still considered to be confidential.

A 50% sample of the BHP is accessible to external researchers through the FDZ (Research Data Center) of the IAB. Applicants must fill out a form, subject to approval, and can access the data via the various access mechanisms of the FDZ, including physical locations in Germany, elsewhere in Europe, and North America.

Synthetic data

Synthetic data from both confidential data were never released to the public, and are accessible only via the same access mechanisms as above. A related synthetic LEAP was made available through the Canadian Research Data Center system, as part of a pilot program, to prepare access to the confidential data. We are not aware of current access. The outcomes of the pilot program have not been made public yet.

NAICS data

As a small part of the post-processing, we count the (theoretical) number of Canadian NAICS industry groups (Statistics Canada, 2012). The file can be downloaded from https://www.statcan.gc.ca/eng/subjects/standard/naics/2012/index.

## Parsed with column specification:
## cols(
##   Level = col_double(),
##   `Hierarchical structure` = col_character(),
##   Code = col_character(),
##   `Class title` = col_character(),
##   Superscript = col_character(),
##   `Class definition` = col_character()
## )
Level Hierarchical structure Code Class title Superscript Class definition
Min. :1.000 Length:2078 Length:2078 Length:2078 Length:2078 Length:2078
1st Qu.:4.000 Class :character Class :character Class :character Class :character Class :character
Median :4.000 Mode :character Mode :character Mode :character Mode :character Mode :character
Mean :4.161 NA NA NA NA NA
3rd Qu.:5.000 NA NA NA NA NA
Max. :5.000 NA NA NA NA NA

Analytic outcomes

The analytic outcomes described in the article were released through the respective disclosure avoidance mechanisms, subject to disclosure avoidance procedures of each statistical institution. These outcomes, as figures, CSV files, and others, are available in this repository. Some were extracted from figures or released tables by the programs in this directory.

Files

Data directory

The data directory contains materials released from Statistics Canada and the IAB. It is mostly highly aggregated synthetic data, as well as regression coefficients. All data releases were authorized by the respective statistical agencies.

graphs directory

The graphs contains mostly pre-rendered figures released as part of the agency data releases. The programs to generate these figures can be found in programs/Canada, and were run on the confidential data.

stata-graphs directory

The graphs contains GPH (Stata) format files, the source for the PDFs in the graphs directory. The programs to generate these figures can be found in programs/Canada, and were run on the confidential data.

Programs

Programs for analysis (programs/Canada, used for both Canada and Germany), and post-processing (programs/Post) are provided.

Derived graphs

Graphs generated through post-processing (programs/Post) are available in r-graphs.

Tables

Tables generated both by tabulation of confidential data (programs/Canada, used for both Canada and Germany), and post-processing (programs/Post) can be found in the tables directory.

Computation

Computational Requirements

  • R (for post-processing)
  • Stata (for analysis of synthetic and confidential data)
  • bash shell (optional, to execute all programs in order)
  • SAS (for the synthetic data generation)

Synthetic generation

The software used to generate the synthetic data is described in Kinney et al (2011b). A copy of the code can be obtained by request.

Intra-mural Analysis

The raw synthetic and confidential data served as input to the various analyses described in the paper. These analyses occurred within the secure environments of the respective agencies. The code for the analysis is common to both countries (with minor adjustments to account for different variable names). The code used in the Canadian context is provided as a single Stata file in the [programs/Canada](programs/Canada) directory.

Extra-mural post-processing

The following programs are used to post-process the analytic results:

Stata programs

R programs

Execution of programs

Numbered programs should be executed in the natural order. Other programs define locations and/or subroutines, and should not be executed. A convenience bash script run_all.sh is provided.

Funding

Vilhuber acknowledges funding through NSF Grants SES-1131848 and SES-1042181, and a grant from Alfred P. Sloan Grant (G-2015-13903). Alam and Dostie acknowledge funding through SSHRC Partnership Grant “Productivity, Firms and Incomes”. The creation of the Synthetic LBD was funded by NSF Grant SES-0427889.

License

These data are licensed under a Creative Commons Attribution-NonCommercial 4.0 International license. See [citation] for attribution.