Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published May 1, 2024 | Version v1
Dataset Open

Synthetic business population containing simulated business variables

  • 1. ROR icon Australian National University
  • 2. ROR icon Australian Bureau of Statistics

Description

This dataset contain simulated data for a fully synthetic business population.  The dataset contains 900,000 records, each of which represents a simulated business.  It resembles the real-world population of employing businesses in Australia in terms of the distribution of businesses across size categories, industry classes and geographic regions (state).  The data for the population has been generated using a combination of published survey outputs available from the Australian Bureau of Statistics (ABS) website, and employee tax data and survey data sourced from the Business Longitudinal Analysis Data Environment (BLADE) in the ABS DataLab.

Notes (English)

General Disclaimer

The attached dataset has been produced by the creators for the purpose of undertaking research.  It does not contain any real data collected from businesses and does not constitute output published by the Australian Bureau of Statistics.  Where used, the dataset should be attributed clearly to the dataset creators.

 

ABS DataLab Disclaimer

The parameters used to create the synthetic population are based, in part, on data supplied to the ABS under the Taxation Administration Act 1953, A New Tax System (Australian Business Number) Act 1999, Australian Border Force Act 2015, Social Security (Administration) Act 1999, A New Tax System (Family Assistance) (Administration) Act 1999, Paid Parental Leave Act 2010 and/or the Student Assistance Act 1973. Such data may only used for the purpose of administering the Census and Statistics Act 1905 or performance of functions of the ABS as set out in section 6 of the Australian Bureau of Statistics Act 1975. No individual information collected under the Census and Statistics Act 1905 is provided back to custodians for administrative or regulatory purposes. Any discussion of data limitations or weaknesses is in the context of using the data for statistical purposes and is not related to the ability of the data to support the Australian Taxation Office, Australian Business Register, Department of Social Services and/or Department of Home Affairs’ core operational requirements.

Legislative requirements to ensure privacy and secrecy of these data have been followed. For access to PLIDA and/or BLADE data under Section 16A of the ABS Act 1975 or enabled by section 15 of the Census and Statistics (Information Release and Access) Determination 2018, source data are de-identified and so data about specific individuals has not been viewed in conducting this analysis. In accordance with the Census and Statistics Act 1905, results have been treated where necessary to ensure that they are not likely to enable identification of a particular person or organisation.

Technical info (English)

Variables contained in the dataset include:

Variable Name Numeric/Categorical Range Description
indgrp Categorical 2 to 18 Industry Class
state Categorical 1 to 8 State (geographic region)
empgrp Categorical

"Emp04" = 0 - 4 employees,

"Emp519" = 5 - 19 employees,

"Emp2049" = 20 - 49 employees,

"Emp5099" = 50 - 99 employees,

"Emp100149" = 100 - 149 employees,

"Emp150199" = 150 - 199 employees,

"Emp200249" = 200 - 249 employees,

"Emp250299" = 250 - 299 employees,

"Emp300349" = 300 - 349 employees,

"Emp350399" = 350 - 399 employees,

"Emp400449" = 400 - 449 employees,

"Emp450499", = 450 - 499 employees,

"Emp500999" = 500 - 999 employees,

"Emp1000" = 1000+ employees

Employment Size Group
frame.emp Numeric Numbers 0 or greater Frame number of employees
rep.emp Numeric Numbers 0 or greater Reported number of employees
earnings Numeric Numbers 0 or greater Total weekly wages/salaries paid to employees
ovt Numeric Numbers 0 or greater Overtime paid to employees
earnings.me Numeric Numbers 0 or greater Measurement error version of "earnings" variable
rep.emp.me Numeric Numbers 0 or greater Measurement error version of "rep.emp" variable
ovt.me Numeric Numbers 0 or greater Measurement error version of "ovt" variable
id     Identifier for the business

Methods

Details regarding how the dataset was created can be found in the link below as part of Related Works.

Files

synthetic_population.csv

Files (40.7 MB)

Name Size Download all
md5:a06c085885f4529d5b234d99c53c0900
40.7 MB Preview Download

Additional details

Related works

Is documented by
Preprint: 10.48550/arXiv.2405.14208 (DOI)

Dates

Created
2023-03-01
Initial creation of dataset
Available
2024-05-01
Dataset made available on repository