Published September 8, 2024 | Version v1
Dataset Open

New York State Synthetic Population

Description

The synthetic population includes nearly 20 million individuals and 7.5 million households in the whole New York State using the PUMS from 2021 5-year ACS. The marginals obtained from the synthetic population well matches the census marginals. When coming to attribute combinations, the synthetic population can still generally follow what the input sample depict. In addition, the synthetic population reconstructs the associations among household members that the input sample shows.

We propose a population synthesis framework that involves both the deterministic model and ciDATGAN to generate households and corresponding personal synthetic populations. The framework is illustrated in the figure below.

 

A wide range of socio-demographic variables are included, and the variables selected for this study can be found in Table 1. We aggregate categories of some attributes deemed too granular, such as age and working industry (NAICS). To capture potential spatial heterogeneity of the population between New York City (NYC) and non-NYC regions, we separate PUMS by filtering regions within and outside of NYC using the Public Use Microdata Areas (PUMAs).

Because NYC is the most densely populated region in the US with high population diversity, we want higher population resolutions. Therefore, we further assign the NYC specific PUMS from PUMA level to Census Tract (CT) levels by using Popgen.

Table 1. Selected attributes of input samples

 

Non-NYC region attribute (label name)

No. of values (range if continuous)

NYC region attribute (label name)

No. of values

Household attribute

Residence area (PUMA)

90

Residence area (CT)

2313

Income level (HINCP)

9

Income level (HINCP)

9

Vehicle ownership (VEH)

4

Vehicle ownership (VEH)

4

Personal attribute

Age (AGEP)

7

Age (AGEP)

7

English proficiency (ENG)

5

English proficiency (ENG)

5

Commute trip length (JWMNP)

0-140 min

Gender (SEX)

2

Commute mode (JWTRNS)

13

Disability (DIS)

2

School status (SCH)

3

Working industry (NAICSP)

2

Gender (SEX)

2

Race white/non-white (RACWHT)

2

Disability (DIS)

2

   

Working industry (NAICSP)

20

   

Race white/non-white (RACWHT)

2

   

 

Files

synth_household_nonNYC.csv

Files (997.1 MB)

Name Size Download all
md5:2779a863b1cde0e06b61875f007cfcff
71.9 MB Preview Download
md5:568b4a9b0d76cd702ccfd36a6edf7584
72.8 MB Preview Download
md5:6a575653b88f77d5f4e2bb93c71bb1ea
629.5 MB Preview Download
md5:0ffd8c61926366163d35416f68e963a7
223.0 MB Preview Download