Synthetic populations for regions of the World (SPW) is a collection of data sets, each data set being a synthetic population for a country, or state, both of which will be referred to as a region in the following. At a high level, a synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).
Current version: 0.7
SPW 0.7 is a collection of data sets, each data set being a synthetic population for a country, or state, both of which will be referred to as a region in the following. At a high level, a synthetic population of a region as provided here, captures the people of the region with selected demographic attributes, their organization into households, their assigned activities for a day, the locations where the activities take place and thus where interactions among population members happen (e.g., spread of epidemics).
Data organization: the synthetic population for a region is shared as a collection of related CSV files as follows:
-
Person data {region}_person_v_{major}_{minor}.csv
-
Household data {region}_household_v_{major}_{minor}.csv
-
Residence location data {region}_residence_locations_v_{major}_{minor}.csv)
-
Activity location data {region}_activity_locations_v_{major}_{minor}.csv
-
Activity location assignment data {region}_activity_location_assignment_v_{major}_{minor}.csv
-
Contact matrix data {region}_contact_matrix_v_{major}_{minor}.csv
The data is described in detail in [TBD] and the data dictionary [TBD], but a summary is provided below. Each synthetic population is constructed using several mathematical models, collected data, as well as statistically imputed data for regions missing one or more data sources required for the constructions. Each population has an associated metadata file providing the list of collected and imputed data sources used in its construction.
-
Person data: contains data for each person including attributes such as age, gender, and household ID.
-
Household data: contains data at household level.
-
Residence- and activity locations data: contains data about residence locations and activity locations, including what activity types are supported at these locations.
-
Activity location assignment data: for each person and for each of their activities, this file specifies the location where the activity takes place.
-
Contact matrix data: a POLYMOD-type contact matrix constructed from a network representation of the location assignment data and a within-location contact model.
Disclaimer: to be added.
Acknowledgments: This project was supported by the National Science Foundation under the NSF RAPID: COVID-19 Response Support: Building Synthetic Multi-scale Networks (PI: Madhav Marathe, Co-PIs: Henning Mortveit, Srinivasan Venkatramanan; Fund Number: OAC-2027541).
License: Unless otherwise noted the datasets in this collection are made available under the CC-BY-4.0 license (https://creativecommons.org/licenses/by/4.0/)