Raw data received from data finder as a starting point:
C:\external\SCORE_Fielding-Miller_covid_R3pV date was long now str8 (57 missing values generated) Contains data from data-raw\Script\merged_covid_usa_v2.dta obs: 340,819 vars: 16 4 Oct 2020 20:00 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── storage display value variable name type format label variable label ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── date str8 %9s date_proper float %td.. county str33 %33s state str24 %24s fips long %12.0g cases long %8.0g Cases deaths int %8.0g S1602_C01_001E double %10.0g S2701_C05_011E double %10.0g S2701_C05_012E double %10.0g S1701_C03_001E double %10.0g DP05_0001E long %10.0g DP05_0024PE double %10.0g LABOR long %10.0g Value A00002_002 float %9.0g Population Density (Per Sq. Mile) sip_effect float %9.0g ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Sorted by: fips Note: Dataset has changed since last saved.
Variable │ Obs Mean Std. Dev. Min Max ─────────────┼───────────────────────────────────────────────────────── date_proper │ 340,762 2020-05-21 33.61868 2020-01-21 2020-07-16
Cleaning up observations with missing geography
. drop if county == "Unknown" (3,189 observations deleted)
. mdesc Variable │ Missing Total Percent Missing ────────────────┼─────────────────────────────────────────────── date │ 0 337,630 0.00 date_proper │ 57 337,630 0.02 county │ 57 337,630 0.02 state │ 57 337,630 0.02 fips │ 279 337,630 0.08 cases │ 57 337,630 0.02 deaths │ 57 337,630 0.02 S1602_C~001E │ 399 337,630 0.12 S2701_C~011E │ 287 337,630 0.09 S2701_C~012E │ 287 337,630 0.09 S1701_C~001E │ 401 337,630 0.12 DP05_0001E │ 287 337,630 0.09 DP05_0024PE │ 287 337,630 0.09 LABOR │ 17,341 337,630 5.14 A00002_002 │ 287 337,630 0.09 sip_effect │ 0 337,630 0.00 ────────────────┼───────────────────────────────────────────────
fips
codes.. preserve . keep if mi(fips) (337,351 observations deleted) . fre county county ──────────────────────┬──────────────────────────────────────────── │ Freq. Percent Valid Cum. ──────────────────────┼──────────────────────────────────────────── Valid Joplin │ 22 7.89 7.89 7.89 Kansas City │ 119 42.65 42.65 50.54 New York City │ 138 49.46 49.46 100.00 Total │ 279 100.00 100.00 ──────────────────────┴──────────────────────────────────────────── . * fre state . restore
. preserve . keep if mi(date_proper) (337,573 observations deleted) . * fre fips . distinct fips │ Observations │ total distinct ───────┼────────────────────── fips │ 57 57 . restore
Among them NY boroughs?
. count if inlist(fips, 36005, 36047, 36061, 36081, 36085) 5
Excluding cases where no fips code is available and no link to deaths data.
. drop if mi(fips) (279 observations deleted) . drop if mi(date_proper) (57 observations deleted)
Authors specifythat analysis focuses on 50 states - that most likely means exclusions of Northern Mariana Islands
and Puerto Rico
. All counties from there were excluded.
American Samoa AS 60 Guam GU 66 Northern Mariana Islands MP 69 Puerto Rico PR 72 Virgin Islands VI 78
. drop if state == "Northern Mariana Islands" (6 observations deleted) . drop if state == "Puerto Rico" (5,657 observations deleted) . . * first two could be also achieved with . * drop if fips >= 60000 . * su fips
Examples from the date of analyses
. * ta county if mi(S1602_C01_001E) // & date == "2020-04-26" . * ta county if mi(S1701_C03_001E) // & date == "2020-04-26" . * ta county if mi(LABOR) // & date == "2020-04-26" . distinct county if mi(S1602_C01_001E) & date_proper == date("2020-07-16", "YMD") │ Observations │ total distinct ────────┼────────────────────── county │ 2 2 . distinct county if mi(S1701_C03_001E) & date_proper == date("2020-07-16", "YMD") │ Observations │ total distinct ────────┼────────────────────── county │ 1 1 . distinct county if mi(LABOR) & date_proper == date("2020-07-16", "YMD") │ Observations │ total distinct ────────┼────────────────────── county │ 114 114 . . fre state if mi(LABOR) & date_proper == date("2020-07-16", "YMD") state ─────────────────────────────┬──────────────────────────────────────────── │ Freq. Percent Valid Cum. ─────────────────────────────┼──────────────────────────────────────────── Valid Alaska │ 26 22.81 22.81 22.81 California │ 2 1.75 1.75 24.56 Colorado │ 4 3.51 3.51 28.07 District of Columbia │ 1 0.88 0.88 28.95 Florida │ 3 2.63 2.63 31.58 Georgia │ 10 8.77 8.77 40.35 Kentucky │ 3 2.63 2.63 42.98 Louisiana │ 3 2.63 2.63 45.61 Maryland │ 1 0.88 0.88 46.49 Michigan │ 4 3.51 3.51 50.00 Minnesota │ 1 0.88 0.88 50.88 Missouri │ 1 0.88 0.88 51.75 Nevada │ 3 2.63 2.63 54.39 New Jersey │ 3 2.63 2.63 57.02 North Carolina │ 2 1.75 1.75 58.77 Pennsylvania │ 2 1.75 1.75 60.53 Texas │ 2 1.75 1.75 62.28 Virginia │ 35 30.70 30.70 92.98 West Virginia │ 6 5.26 5.26 98.25 Wisconsin │ 2 1.75 1.75 100.00 Total │ 114 100.00 100.00 ─────────────────────────────┴────────────────────────────────────────────
LABOR
particularly affected.
No reasonable response from data finder.
These cases will be excluded from the analyses since this is default Stata’s behaviour and no description on handling missingness was provided in paper.
Five of New York’s counties were aggregated for counting cases/deaths in the paper. Most likely following data format from NYT.
Counties/boroughs and their codes are:
Data was excluded at earlier stage since no info on cases was found for them and no fix for fips
code was made.
Explanation needed from data finder here?
It might be lucky that we only test the rural counties?
Not sure what the impact on spatial set up is? Were these counties aggregated in shape files to form one NY?
Already in the dataset
. ren S1602_C01_001E nonenglish . order nonenglish, a(deaths) . la var nonenglish "% nonenglish speaking hh" . univar nonenglish, d(1) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── nonenglish 331517 1.9 3.0 0.0 0.3 0.9 2.1 38.6 ───────────────────────────────────────────────────────────────────────────────
Already in the dataset
. gen farmwork = (LABOR / DP05_0001E) * 100 (11,389 missing values generated) . order farmwork, a(nonenglish) . la var farmwork "% engaged in farmwork" . univar farmwork, d(1) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── farmwork 320242 2.6 3.5 0.0 0.6 1.4 3.1 45.9 ─────────────────────────────────────────────────────────────────────────────── . drop LABOR DP05_0001E
Already in the dataset, but has to be constructed from two vars
. gen uninsured = S2701_C05_011E + S2701_C05_012E . order uninsured, a(farmwork) . la var uninsured "% uninsured under 65" . univar uninsured, d(1) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── uninsured 331631 20.6 10.6 2.7 12.7 18.5 25.8 90.0 ─────────────────────────────────────────────────────────────────────────────── . drop S2701_C05_011E S2701_C05_012E
Already in the dataset
. ren S1701_C03_001E poverty . order poverty, a(uninsured) . la var poverty "% below poverty line" . univar poverty, d(1) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── poverty 331517 15.6 6.4 2.3 11.1 14.8 19.1 55.1 ───────────────────────────────────────────────────────────────────────────────
Already in the dataset
. ren DP05_0024PE older . order older, a(poverty) . la var older "% aged 65 and above" . univar older, d(1) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── older 331631 17.6 4.4 3.7 14.8 17.3 19.9 54.2 ───────────────────────────────────────────────────────────────────────────────
Already in the dataset
. ren A00002_002 pop_dens . order pop_dens, a(older) . la var pop_dens "Pop density" . univar pop_dens, d(1) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── pop_dens 331631 255.9 909.6 0.0 22.4 52.6 144.4 18565.5 ───────────────────────────────────────────────────────────────────────────────
Defined from pop density
. gen nonurban = pop_dens <= 1000 . order nonurban, a(date_proper) . la var nonurban "Non-urban counties flag" . la de nonurban 0 "urban" 1 "non-urban" . la val nonurban nonurban . fre nonurban if date_proper == date("2020-07-16", "YMD") nonurban -- Non-urban counties flag ────────────────────┬──────────────────────────────────────────── │ Freq. Percent Valid Cum. ────────────────────┼──────────────────────────────────────────── Valid 0 urban │ 143 4.64 4.64 4.64 1 non-urban │ 2942 95.36 95.36 100.00 Total │ 3085 100.00 100.00 ────────────────────┴────────────────────────────────────────────
There are no records for states with 0 cases so first date in the database for each county is the date needed.
. sort fips date_proper, stable . by fips: egen date_case1 = min(date_proper) . format date_case1 %tdCCYY-NN-DD . gen time_case1 = date_proper - date_case1 . *drop date_case1 . la var date_case1 "Date of case 1" . la var time_case1 "Days since case 1 in county"
Dates of SIP for each state were provided by data finder.
Five states had it imputed to 0 as specified by paper.
There was one incosistency between paper and data finder - using values from preprint (Wyoming
).
. gen temp = date_proper if cases >= 100 (226,596 missing values generated) . sort fips date_proper, stable . by fips: egen date_case100 = min(temp) (131,942 missing values generated) . drop temp . format date_case100 %tdCCYY-NN-DD . . gen time_case100 = date_case100 - sip_effect (145,316 missing values generated) . *drop date_case100 . . * this states are 0 as per preprint specs . replace time_case100 = 0 if inlist(state, "Arkansas", "Iowa", "Nebraska", "North Dakota", "Wyoming") (32,871 real changes made) . . * this state is not mentioned in the preprint but should be 0 according to data finder . replace time_case100 = 0 if inlist(state, "South Dakota") (5,624 real changes made) . . la var date_case100 "Date of case 100" . la var time_case100 "Days between case 100 in county and SIP in state"
There is a problem with generating this variable for counties that did not reach 100 cases by the time analyses were conducted.
Preprint is silent on what was done in such cases. The only two reasonable strategies would be to run analyses without counties with missing information (which would result in drastic sample size reduction) or impute it to 0. The latter has been applied in this case.
. distinct fips if mi(time_case100) │ Observations │ total distinct ───────┼────────────────────── fips │ 106821 1112 . . replace time_case100 = 0 if mi(time_case100) & mi(date_case100) (106,821 real changes made)
Deaths reported by NY Times
. la var deaths "Deaths" . univar deaths, d(0) ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── deaths 331631 22 137 0 0 1 5 4750 ─────────────────────────────────────────────────────────────────────────────── . univar deaths, d(0) by(nonurban) -> nonurban=urban ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── deaths 18249 255 508 0 3 43 262 4750 ─────────────────────────────────────────────────────────────────────────────── -> nonurban=non-urban ────────────── Quantiles ────────────── Variable n Mean S.D. Min .25 Mdn .75 Max ─────────────────────────────────────────────────────────────────────────────── deaths 313382 8 37 0 0 1 4 1311 ───────────────────────────────────────────────────────────────────────────────
SCORE recomendation regarding time frame of the data:
For this replication, SCORE recommends three analyses be performed: one analysis that only uses the dates that have occurred since the original analysis, one analysis that combines all available dates, a third analysis that only uses dates that were used in the original analysis
The first and second analyses are the same in such cases - they both would use data with the latest available date.
This script generates data with the most up to date available information which is 2020-07-16
.
(328,544 observations deleted) Variable │ Obs Mean Std. Dev. Min Max ─────────────┼───────────────────────────────────────────────────────── date_proper │ 3,087 2020-07-15 .1609609 2020-07-08 2020-07-16 nonurban -- Non-urban counties flag ────────────────────┬──────────────────────────────────────────── │ Freq. Percent Valid Cum. ────────────────────┼──────────────────────────────────────────── Valid 0 urban │ 143 4.63 4.63 4.63 1 non-urban │ 2944 95.37 95.37 100.00 Total │ 3087 100.00 100.00 ────────────────────┴──────────────────────────────────────────── file data\merged_covid_usa_prepared_extended.dta saved Contains data from data\merged_covid_usa_prepared_extended.dta obs: 3,087 vars: 18 23 Nov 2020 11:22 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── storage display value variable name type format label variable label ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── date str8 %9s nonurban byte %9.0g nonurban Non-urban counties flag county str33 %33s state str20 %20s fips long %12.0g cases long %8.0g Cases deaths int %8.0g Deaths nonenglish double %10.0g % nonenglish speaking hh farmwork float %9.0g % engaged in farmwork uninsured float %9.0g % uninsured under 65 poverty double %10.0g % below poverty line older double %10.0g % aged 65 and above pop_dens float %9.0g Pop density date_case1 int %td.. Date of case 1 time_case1 int %9.0g Days since case 1 in county sip_effect int %td.. SIP effect start date date_case100 int %td.. Date of case 100 time_case100 int %9.0g Days between case 100 in county and SIP in state ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Sorted by: fips
Spatially contiguous. Prepared with 01_spatial-sample.Rmd
script.
(1 var, 157 obs) file data\fips_sample.dta saved ─────────────────────┬───────────────────────────────────────────────────────── merge specs │ matching type │ 1:1 mv's on match vars │ none unmatched obs from │ both ─────────────────────┼───────────────────────────────────────────────────────── master file │ data\merged_covid_usa_prepared_extended.dta obs │ 3087 vars │ 18 match vars │ fips (key) ───────────────────┼───────────────────────────────────────────────────────── using file │ data\fips_sample.dta obs │ 157 vars │ 1 match vars │ fips (key) ─────────────────────┼───────────────────────────────────────────────────────── result file │ data\merged_covid_usa_prepared_extended.dta obs │ 3094 vars │ 20 (including _merge) ────────────┼───────────────────────────────────────────────────────── _merge │ 2937 obs only in master data (code==1) │ 7 obs only in using data (code==2) │ 150 obs both in master and using data (code==3) ─────────────────────┴───────────────────────────────────────────────────────── (2,944 observations deleted) file data\merged_covid_usa_prepared_extended_sample.dta saved Contains data from data\merged_covid_usa_prepared_extended_sample.dta obs: 150 vars: 18 23 Nov 2020 11:22 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── storage display value variable name type format label variable label ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── date str8 %9s nonurban byte %9.0g nonurban Non-urban counties flag county str33 %33s state str20 %20s fips long %12.0g cases long %8.0g Cases deaths int %8.0g Deaths nonenglish double %10.0g % nonenglish speaking hh farmwork float %9.0g % engaged in farmwork uninsured float %9.0g % uninsured under 65 poverty double %10.0g % below poverty line older double %10.0g % aged 65 and above pop_dens float %9.0g Pop density date_case1 int %td.. Date of case 1 time_case1 int %9.0g Days since case 1 in county sip_effect int %td.. SIP effect start date date_case100 int %td.. Date of case 100 time_case100 int %9.0g Days between case 100 in county and SIP in state ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Sorted by:
file data\fips_extended.csv saved
(2,973 observations deleted) file data\fips_missing_farmwork.csv saved