Synthetic business population containing simulated business variables
Creators
Description
This dataset contain simulated data for a fully synthetic business population. The dataset contains 900,000 records, each of which represents a simulated business. It resembles the real-world population of employing businesses in Australia in terms of the distribution of businesses across size categories, industry classes and geographic regions (state). The data for the population has been generated using a combination of published survey outputs available from the Australian Bureau of Statistics (ABS) website, and employee tax data and survey data sourced from the Business Longitudinal Analysis Data Environment (BLADE) in the ABS DataLab.
Notes (English)
Technical info (English)
Variables contained in the dataset include:
Variable Name | Numeric/Categorical | Range | Description |
indgrp | Categorical | 2 to 18 | Industry Class |
state | Categorical | 1 to 8 | State (geographic region) |
empgrp | Categorical |
"Emp04" = 0 - 4 employees, "Emp519" = 5 - 19 employees, "Emp2049" = 20 - 49 employees, "Emp5099" = 50 - 99 employees, "Emp100149" = 100 - 149 employees, "Emp150199" = 150 - 199 employees, "Emp200249" = 200 - 249 employees, "Emp250299" = 250 - 299 employees, "Emp300349" = 300 - 349 employees, "Emp350399" = 350 - 399 employees, "Emp400449" = 400 - 449 employees, "Emp450499", = 450 - 499 employees, "Emp500999" = 500 - 999 employees, "Emp1000" = 1000+ employees |
Employment Size Group |
frame.emp | Numeric | Numbers 0 or greater | Frame number of employees |
rep.emp | Numeric | Numbers 0 or greater | Reported number of employees |
earnings | Numeric | Numbers 0 or greater | Total weekly wages/salaries paid to employees |
ovt | Numeric | Numbers 0 or greater | Overtime paid to employees |
earnings.me | Numeric | Numbers 0 or greater | Measurement error version of "earnings" variable |
rep.emp.me | Numeric | Numbers 0 or greater | Measurement error version of "rep.emp" variable |
ovt.me | Numeric | Numbers 0 or greater | Measurement error version of "ovt" variable |
id | Identifier for the business |
Methods
Details regarding how the dataset was created can be found in the link below as part of Related Works.
Files
synthetic_population.csv
Files
(40.7 MB)
Name | Size | Download all |
---|---|---|
md5:a06c085885f4529d5b234d99c53c0900
|
40.7 MB | Preview Download |
Additional details
Related works
- Is documented by
- Preprint: 10.48550/arXiv.2405.14208 (DOI)
Dates
- Created
-
2023-03-01Initial creation of dataset
- Available
-
2024-05-01Dataset made available on repository