Published September 20, 2024
| Version v2.0.0
Software
Open
TabeaSonnenschein/GenSynthPop: R-package for Generating Representative Spatially Explicit Synthetic Populations
Authors/Creators
- 1. Utrecht University
Description
Instructions for R-package: GenSynthPop
This repository contains the implementation of GenSynthPop, a sample-free tool to construct Synthetic Populations from mixed-aggregation contingency tables.
This package contains a set of functions that help prepare stratified census datasets to generate conditional propensities, combines the conditional propensities with spatial marginal distributions to generate a representative population and validates that the produced agents have a similar distribution as the initial spatial marginal datasets and the stratified datasets. The generated population is representative for a city or the spatial extent that is fed into the algorithms and can be used for simulation purposes, such as an agent-based model. The smaller the spatial units of the spatial marginal distributions, the more spatially resolved the agents will be too.
Updates
Changes in Version 2.0.0 of the package GenSynthPop compared to Version 1.0.0
* implements iterative proportionate fitting to fit multi-variable joint distributions to spatial marginal distributions.
* implements deterministic assignment, instead of probability distribution sampling
* fuses all steps into a single function, for ease of use
The work in this repository is described in:
de Mooij, J., Sonnenschein, T., Pellegrino, M. et al. GenSynthPop: generating a spatially explicit synthetic population of individuals and households from aggregated data. Auton Agent Multi-Agent Syst 38, 48 (2024).
An Python implementation of this library is available here
Main Function
Conditional_attribute_adder(): Adds a target attribute to a synthetic population by fitting it to a contingency table, optionally using iterative proportional fitting (IPF) with margins.
Installing package in R
install.packages("devtools")
library(devtools)
install_github("TabeaSonnenschein/GenSynthPop")
library(GenSynthPop)
Looking up documentation for a function
There is extensive documentation for the functions within R
Example:
?Conditional_attribute_adder
help(Conditional_attribute_adder)
Should there be remaining questions, shoot me an email: t.s.sonnenschein@uu.nl
Instructions
1. Start by collecting neighborhood marginal distributions of age_groups. It is recommended to go as spatially resolved as you can (smallest spatial unit) but it depends on what you want to use the synthetic agent population for. You theoretically can even use provincial or national administrative areas, if this is your project scope and goal. We go for neighborhoods because we want to create an urban ABM.
2. generate a population by generating unique agents for each person living in each neighborhood
# Load the library
library(GenSynthPop)
neigh_df = read.csv("Neighborhood_statistics.csv")
# Initialize the agent_df
agent_neighborhoods = list()
agent_count = 0
for (i in 1:nrow(neigh_df)) {
neighb_code = neigh_df[i, "neighb_code"]
neighb_total = neigh_df[i, "nr_residents"]
agent_neighborhoods = c(agent_neighborhoods, rep(neighb_code, neighb_total))
agent_count = agent_count + neighb_total
}
agent_ids = paste0("Agent_", 0:(agent_count - 1))
agent_df = data.frame(agent_id = unlist(agent_ids),
neighb_code = unlist(agent_neighborhoods))
3. use this new agent_df and the neighborhood marginal distribution dataframe to distribute the agents across neighborhoods and age groups.
agecols = c("0-15", "15-25", "25-45", "45-65", "65+")
ageneigh_df = neigh_df[unlist(c("neighb_code", agecols))] %>%
pivot_longer(cols = all_of(agecols),
names_to = "age_group",
values_to = "count") # Create a new column for counts
ageneigh_df = as.data.frame(ageneigh_df)
agent_df = Conditional_attribute_adder(df = agent_df,
df_contingency = ageneigh_df,
target_attribute = "age_group",
group_by = c("neighb_code"))
print(head(agent_df))
4. Read the stratified dataframe with the conditional variable and the variable of interest (that you want to add), for example sex by agegroup, since we already added that one. Make sure that the classes of the conditional variables correspond to the ones in the agent_df. We can now use additional neighborhood margins that we have
sex_age_df = read.csv("sex_age_statistics.csv") # columns age_group, sex, counts
sexneigh_df <- sexneigh_df[unlist(c("neighb_code", sexcols))] %>%
pivot_longer(cols = all_of(sexcols),
names_to = "sex",
values_to = "count")
sexneigh_df <- as.data.frame(sexneigh_df)
agent_df = Conditional_attribute_adder(df = agent_df,
df_contingency = sex_age_df ,
target_attribute = "sex",
group_by = c("neighb_code"),
margins= list(ageneigh_df, sexneigh_df),
margins_names= c("age_group", "sex"))
print(head(agent_df))
5. Now we can add multi-variable contingency tables and repeat the function for any data and variables we would like to add. For example let us add education level based on age and sex. We can now use the neighborhood margins for age_group, sex, or even as well for education_level. The function can take contingency tables with any number of variables and any number of neighborhood marginal data. The only requirement is that the conditional variables of the contingency table and marginal data are represented in the agent_df. So all variables apart from the target attribute. The algorithm can deal with cases when no neighborhood marginal data is available for some conditional variables or target attributes.
edu_age_sex_df = read.csv("edu_sex_age_statistics.csv") # columns age_group, sex, education_level counts
agent_df = Conditional_attribute_adder(df = agent_df,
df_contingency = edu_age_sex_df ,
target_attribute = "education_level",
group_by = c("neighb_code"),
margins= list(ageneigh_df, sexneigh_df),
margins_names= c("age_group", "sex"))
print(head(agent_df))
you can look at the Example_Application_GenSynthPop.R script for an example application of the functions in the package.
License
This package is licensed under the MIT License.
Files
Files
(18.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:9975a0eba111bd2facf66e109ab86880
|
18.8 kB | Download |
Additional details
Identifiers
Related works
- Is supplement to
- https://github.com/TabeaSonnenschein/GenSynthPop/tree/v2.0.0 (URL)
Software
- Repository URL
- https://github.com/TabeaSonnenschein/GenSynthPop/tree/main
- Programming language
- R
- Development Status
- Active