Input

Raw data received from data finder as a starting point:

C:\external\SCORE_Fielding-Miller_covid_R3pV

date was long now str8

(57 missing values generated)

Contains data from data-raw\Script\merged_covid_usa_v2.dta
  obs:       340,819                          
 vars:            16                          4 Oct 2020 20:00
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
              storage   display    value
variable name   type    format     label      variable label
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
date            str8    %9s                   
date_proper     float   %td..                 
county          str33   %33s                  
state           str24   %24s                  
fips            long    %12.0g                
cases           long    %8.0g                 Cases
deaths          int     %8.0g                 
S1602_C01_001E  double  %10.0g                
S2701_C05_011E  double  %10.0g                
S2701_C05_012E  double  %10.0g                
S1701_C03_001E  double  %10.0g                
DP05_0001E      long    %10.0g                
DP05_0024PE     double  %10.0g                
LABOR           long    %10.0g                Value
A00002_002      float   %9.0g                 Population Density (Per Sq. Mile)
sip_effect      float   %9.0g                 
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sorted by: fips
     Note: Dataset has changed since last saved.

All dates available

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
 date_proper │    340,762   2020-05-21    33.61868  2020-01-21  2020-07-16

Missing data

Cleaning up observations with missing geography

. drop if county == "Unknown"
(3,189 observations deleted)
. mdesc

    Variable    │     Missing          Total     Percent Missing
────────────────┼───────────────────────────────────────────────
           date │           0        337,630           0.00
    date_proper │          57        337,630           0.02
         county │          57        337,630           0.02
          state │          57        337,630           0.02
           fips │         279        337,630           0.08
          cases │          57        337,630           0.02
         deaths │          57        337,630           0.02
   S1602_C~001E │         399        337,630           0.12
   S2701_C~011E │         287        337,630           0.09
   S2701_C~012E │         287        337,630           0.09
   S1701_C~001E │         401        337,630           0.12
     DP05_0001E │         287        337,630           0.09
    DP05_0024PE │         287        337,630           0.09
          LABOR │      17,341        337,630           5.14
     A00002_002 │         287        337,630           0.09
     sip_effect │           0        337,630           0.00
────────────────┼───────────────────────────────────────────────

Misisng fips codes.

. preserve

. keep if mi(fips)
(337,351 observations deleted)

. fre county

county
──────────────────────┬────────────────────────────────────────────
                      │      Freq.    Percent      Valid       Cum.
──────────────────────┼────────────────────────────────────────────
Valid   Joplin        │         22       7.89       7.89       7.89
        Kansas City   │        119      42.65      42.65      50.54
        New York City │        138      49.46      49.46     100.00
        Total         │        279     100.00     100.00           
──────────────────────┴────────────────────────────────────────────

. * fre state
. restore

Not linked to list of cases from NYT

. preserve

. keep if mi(date_proper)
(337,573 observations deleted)

. * fre fips
. distinct fips

       │        Observations
       │      total   distinct
───────┼──────────────────────
  fips │         57         57

. restore

Among them NY boroughs?

. count if inlist(fips, 36005, 36047, 36061, 36081, 36085)
  5

Exclusions

Excluding cases where no fips code is available and no link to deaths data.

. drop if mi(fips)
(279 observations deleted)

. drop if mi(date_proper)
(57 observations deleted)

Authors specifythat analysis focuses on 50 states - that most likely means exclusions of Northern Mariana Islands and Puerto Rico. All counties from there were excluded.

American Samoa AS 60 Guam GU 66 Northern Mariana Islands MP 69 Puerto Rico PR 72 Virgin Islands VI 78

. drop if state == "Northern Mariana Islands"
(6 observations deleted)

. drop if state == "Puerto Rico"
(5,657 observations deleted)

.     
. * first two could be also achieved with 
. * drop if fips >= 60000
. * su fips

Still some missings

Examples from the date of analyses

. * ta county if mi(S1602_C01_001E) // & date == "2020-04-26"
. * ta county if mi(S1701_C03_001E) // & date == "2020-04-26"
. * ta county if mi(LABOR)             // & date == "2020-04-26"
. distinct county if mi(S1602_C01_001E) & date_proper == date("2020-07-16", "YMD")

        │        Observations
        │      total   distinct
────────┼──────────────────────
 county │          2          2

. distinct county if mi(S1701_C03_001E) & date_proper == date("2020-07-16", "YMD")

        │        Observations
        │      total   distinct
────────┼──────────────────────
 county │          1          1

. distinct county if mi(LABOR)             & date_proper == date("2020-07-16", "YMD")

        │        Observations
        │      total   distinct
────────┼──────────────────────
 county │        114        114

.     
. fre state if mi(LABOR)             & date_proper == date("2020-07-16", "YMD")

state
─────────────────────────────┬────────────────────────────────────────────
                             │      Freq.    Percent      Valid       Cum.
─────────────────────────────┼────────────────────────────────────────────
Valid   Alaska               │         26      22.81      22.81      22.81
        California           │          2       1.75       1.75      24.56
        Colorado             │          4       3.51       3.51      28.07
        District of Columbia │          1       0.88       0.88      28.95
        Florida              │          3       2.63       2.63      31.58
        Georgia              │         10       8.77       8.77      40.35
        Kentucky             │          3       2.63       2.63      42.98
        Louisiana            │          3       2.63       2.63      45.61
        Maryland             │          1       0.88       0.88      46.49
        Michigan             │          4       3.51       3.51      50.00
        Minnesota            │          1       0.88       0.88      50.88
        Missouri             │          1       0.88       0.88      51.75
        Nevada               │          3       2.63       2.63      54.39
        New Jersey           │          3       2.63       2.63      57.02
        North Carolina       │          2       1.75       1.75      58.77
        Pennsylvania         │          2       1.75       1.75      60.53
        Texas                │          2       1.75       1.75      62.28
        Virginia             │         35      30.70      30.70      92.98
        West Virginia        │          6       5.26       5.26      98.25
        Wisconsin            │          2       1.75       1.75     100.00
        Total                │        114     100.00     100.00           
─────────────────────────────┴────────────────────────────────────────────

LABOR particularly affected.

No reasonable response from data finder.

These cases will be excluded from the analyses since this is default Stata’s behaviour and no description on handling missingness was provided in paper.

Five boroughs of New York City

Five of New York’s counties were aggregated for counting cases/deaths in the paper. Most likely following data format from NYT.

Counties/boroughs and their codes are:

Data was excluded at earlier stage since no info on cases was found for them and no fix for fips code was made.

Explanation needed from data finder here?

It might be lucky that we only test the rural counties?

Not sure what the impact on spatial set up is? Were these counties aggregated in shape files to form one NY?

Regression variables

% nonenglish speaking hh

Already in the dataset

. ren S1602_C01_001E nonenglish

. order nonenglish, a(deaths)

. la var nonenglish "% nonenglish speaking hh"

. univar nonenglish, d(1)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
nonenglish  331517      1.9      3.0      0.0      0.3      0.9      2.1     38.6
───────────────────────────────────────────────────────────────────────────────

% engaged in farmwork

Already in the dataset

. gen farmwork = (LABOR / DP05_0001E) * 100
(11,389 missing values generated)

. order farmwork, a(nonenglish)

. la var farmwork "% engaged in farmwork"

. univar farmwork, d(1)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
farmwork  320242      2.6      3.5      0.0      0.6      1.4      3.1     45.9
───────────────────────────────────────────────────────────────────────────────

. drop LABOR DP05_0001E

% uninsured under 65

Already in the dataset, but has to be constructed from two vars

. gen uninsured = S2701_C05_011E + S2701_C05_012E

. order uninsured, a(farmwork)

. la var uninsured "% uninsured under 65"

. univar uninsured, d(1)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
uninsured  331631     20.6     10.6      2.7     12.7     18.5     25.8     90.0
───────────────────────────────────────────────────────────────────────────────

. drop S2701_C05_011E S2701_C05_012E

% below poverty line

Already in the dataset

. ren S1701_C03_001E poverty

. order poverty, a(uninsured)

. la var poverty "% below poverty line"

. univar poverty, d(1)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
 poverty  331517     15.6      6.4      2.3     11.1     14.8     19.1     55.1
───────────────────────────────────────────────────────────────────────────────

% aged 65 and above

Already in the dataset

. ren DP05_0024PE older

. order older, a(poverty)

. la var older "% aged 65 and above"

. univar older, d(1)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
   older  331631     17.6      4.4      3.7     14.8     17.3     19.9     54.2
───────────────────────────────────────────────────────────────────────────────

Pop density

Already in the dataset

. ren A00002_002 pop_dens

. order pop_dens, a(older)

. la var pop_dens "Pop density"

. univar pop_dens, d(1)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
pop_dens  331631    255.9    909.6      0.0     22.4     52.6    144.4  18565.5
───────────────────────────────────────────────────────────────────────────────

Non urban counties

Defined from pop density

. gen nonurban = pop_dens <= 1000

. order nonurban, a(date_proper)

. la var nonurban "Non-urban counties flag"

. la de nonurban 0 "urban" 1 "non-urban"

. la val nonurban nonurban

. fre nonurban if date_proper == date("2020-07-16", "YMD")

nonurban -- Non-urban counties flag
────────────────────┬────────────────────────────────────────────
                    │      Freq.    Percent      Valid       Cum.
────────────────────┼────────────────────────────────────────────
Valid   0 urban     │        143       4.64       4.64       4.64
        1 non-urban │       2942      95.36      95.36     100.00
        Total       │       3085     100.00     100.00           
────────────────────┴────────────────────────────────────────────

No of days since 1st case

There are no records for states with 0 cases so first date in the database for each county is the date needed.

. sort fips date_proper, stable    

. by fips: egen date_case1 = min(date_proper)

. format date_case1 %tdCCYY-NN-DD

. gen time_case1 = date_proper - date_case1

. *drop date_case1
. la var date_case1 "Date of case 1"

. la var time_case1 "Days since case 1 in county"

No of days between 100th case and shelter in place

Dates of SIP for each state were provided by data finder.

Five states had it imputed to 0 as specified by paper.

There was one incosistency between paper and data finder - using values from preprint (Wyoming).

. gen temp = date_proper if cases >= 100
(226,596 missing values generated)

. sort fips date_proper, stable    

. by fips: egen date_case100 = min(temp)
(131,942 missing values generated)

. drop temp

. format date_case100 %tdCCYY-NN-DD

.     
. gen time_case100 = date_case100 - sip_effect
(145,316 missing values generated)

. *drop date_case100
.     
. * this states are 0 as per preprint specs 
. replace time_case100 = 0 if inlist(state, "Arkansas", "Iowa", "Nebraska", "North Dakota", "Wyoming")
(32,871 real changes made)

. 
. * this state is not mentioned in the preprint but should be 0 according to data finder 
. replace time_case100 = 0 if inlist(state, "South Dakota")
(5,624 real changes made)

.     
. la var date_case100 "Date of case 100"

. la var time_case100 "Days between case 100 in county and SIP in state"

There is a problem with generating this variable for counties that did not reach 100 cases by the time analyses were conducted.

Preprint is silent on what was done in such cases. The only two reasonable strategies would be to run analyses without counties with missing information (which would result in drastic sample size reduction) or impute it to 0. The latter has been applied in this case.

. distinct fips if mi(time_case100)

       │        Observations
       │      total   distinct
───────┼──────────────────────
  fips │     106821       1112

.     
. replace time_case100 = 0 if mi(time_case100) & mi(date_case100)
(106,821 real changes made)

Outcome

Deaths reported by NY Times

. la var deaths "Deaths"

. univar deaths, d(0)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
  deaths  331631       22      137        0        0        1        5     4750
───────────────────────────────────────────────────────────────────────────────

. univar deaths, d(0) by(nonurban)

-> nonurban=urban 
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
  deaths   18249      255      508        0        3       43      262     4750
───────────────────────────────────────────────────────────────────────────────

-> nonurban=non-urban 
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
  deaths  313382        8       37        0        0        1        4     1311
───────────────────────────────────────────────────────────────────────────────

Final dataset for analysis

SCORE recomendation regarding time frame of the data:

For this replication, SCORE recommends three analyses be performed: one analysis that only uses the dates that have occurred since the original analysis, one analysis that combines all available dates, a third analysis that only uses dates that were used in the original analysis

The first and second analyses are the same in such cases - they both would use data with the latest available date.

This script generates data with the most up to date available information which is 2020-07-16.

(328,544 observations deleted)

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
 date_proper │      3,087   2020-07-15    .1609609  2020-07-08  2020-07-16

nonurban -- Non-urban counties flag
────────────────────┬────────────────────────────────────────────
                    │      Freq.    Percent      Valid       Cum.
────────────────────┼────────────────────────────────────────────
Valid   0 urban     │        143       4.63       4.63       4.63
        1 non-urban │       2944      95.37      95.37     100.00
        Total       │       3087     100.00     100.00           
────────────────────┴────────────────────────────────────────────

file data\merged_covid_usa_prepared_extended.dta saved

Contains data from data\merged_covid_usa_prepared_extended.dta
  obs:         3,087                          
 vars:            18                          23 Nov 2020 11:22
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
              storage   display    value
variable name   type    format     label      variable label
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
date            str8    %9s                   
nonurban        byte    %9.0g      nonurban   Non-urban counties flag
county          str33   %33s                  
state           str20   %20s                  
fips            long    %12.0g                
cases           long    %8.0g                 Cases
deaths          int     %8.0g                 Deaths
nonenglish      double  %10.0g                % nonenglish speaking hh
farmwork        float   %9.0g                 % engaged in farmwork
uninsured       float   %9.0g                 % uninsured under 65
poverty         double  %10.0g                % below poverty line
older           double  %10.0g                % aged 65 and above
pop_dens        float   %9.0g                 Pop density
date_case1      int     %td..                 Date of case 1
time_case1      int     %9.0g                 Days since case 1 in county
sip_effect      int     %td..                 SIP effect start date
date_case100    int     %td..                 Date of case 100
time_case100    int     %9.0g                 Days between case 100 in county and SIP in state
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sorted by: fips

Sample of 5%

Spatially contiguous. Prepared with 01_spatial-sample.Rmd script.

(1 var, 157 obs)

file data\fips_sample.dta saved

─────────────────────┬─────────────────────────────────────────────────────────
merge specs          │
       matching type │ 1:1
  mv's on match vars │ none
  unmatched obs from │ both
─────────────────────┼─────────────────────────────────────────────────────────
  master        file │ data\merged_covid_usa_prepared_extended.dta
                 obs │   3087
                vars │     18
          match vars │ fips  (key)
  ───────────────────┼─────────────────────────────────────────────────────────
  using         file │ data\fips_sample.dta
                 obs │    157
                vars │      1
          match vars │ fips  (key)
─────────────────────┼─────────────────────────────────────────────────────────
result          file │ data\merged_covid_usa_prepared_extended.dta
                 obs │   3094
                vars │     20  (including _merge)
         ────────────┼─────────────────────────────────────────────────────────
              _merge │   2937  obs only in master data                (code==1)
                     │      7  obs only in using data                 (code==2)
                     │    150  obs both in master and using data      (code==3)
─────────────────────┴─────────────────────────────────────────────────────────

(2,944 observations deleted)

file data\merged_covid_usa_prepared_extended_sample.dta saved

Contains data from data\merged_covid_usa_prepared_extended_sample.dta
  obs:           150                          
 vars:            18                          23 Nov 2020 11:22
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
              storage   display    value
variable name   type    format     label      variable label
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
date            str8    %9s                   
nonurban        byte    %9.0g      nonurban   Non-urban counties flag
county          str33   %33s                  
state           str20   %20s                  
fips            long    %12.0g                
cases           long    %8.0g                 Cases
deaths          int     %8.0g                 Deaths
nonenglish      double  %10.0g                % nonenglish speaking hh
farmwork        float   %9.0g                 % engaged in farmwork
uninsured       float   %9.0g                 % uninsured under 65
poverty         double  %10.0g                % below poverty line
older           double  %10.0g                % aged 65 and above
pop_dens        float   %9.0g                 Pop density
date_case1      int     %td..                 Date of case 1
time_case1      int     %9.0g                 Days since case 1 in county
sip_effect      int     %td..                 SIP effect start date
date_case100    int     %td..                 Date of case 100
time_case100    int     %9.0g                 Days between case 100 in county and SIP in state
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sorted by: 

Appendix

List of codes to check spatial coverage

file data\fips_extended.csv saved

Tracking missing LABOUR

(2,973 observations deleted)

file data\fips_missing_farmwork.csv saved