title: “SCORE: Fielding-Miller et al. (2020) replication”
subtitle: “Analysis of 5% sample”
author: “Radoslaw Panczak”
date: 23 Nov 2020

Preps

Read data

C:\external\SCORE_Fielding-Miller_covid_R3pV\data

Contains data from merged_covid_usa_prepared_extended_sample.dta
  obs:           150                          
 vars:            18                          23 Nov 2020 11:22
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
              storage   display    value
variable name   type    format     label      variable label
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
date            str8    %9s                   
nonurban        byte    %9.0g      nonurban   Non-urban counties flag
county          str33   %33s                  
state           str20   %20s                  
fips            long    %12.0g                
cases           long    %8.0g                 Cases
deaths          int     %8.0g                 Deaths
nonenglish      double  %10.0g                % nonenglish speaking hh
farmwork        float   %9.0g                 % engaged in farmwork
uninsured       float   %9.0g                 % uninsured under 65
poverty         double  %10.0g                % below poverty line
older           double  %10.0g                % aged 65 and above
pop_dens        float   %9.0g                 Pop density
date_case1      int     %td..                 Date of case 1
time_case1      int     %9.0g                 Days since case 1 in county
sip_effect      int     %td..                 SIP effect start date
date_case100    int     %td..                 Date of case 100
time_case100    int     %9.0g                 Days between case 100 in county and SIP in state
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sorted by: 

Missing data

. mdesc

    Variable    │     Missing          Total     Percent Missing
────────────────┼───────────────────────────────────────────────
           date │           0            150           0.00
       nonurban │           0            150           0.00
         county │           0            150           0.00
          state │           0            150           0.00
           fips │           0            150           0.00
          cases │           0            150           0.00
         deaths │           0            150           0.00
     nonenglish │           0            150           0.00
       farmwork │           4            150           2.67
      uninsured │           0            150           0.00
        poverty │           1            150           0.67
          older │           0            150           0.00
       pop_dens │           0            150           0.00
     date_case1 │           0            150           0.00
     time_case1 │           0            150           0.00
     sip_effect │          12            150           8.00
   date_case100 │          89            150          59.33
   time_case100 │           0            150           0.00
────────────────┼───────────────────────────────────────────────

States

. ta state, m

               state │      Freq.     Percent        Cum.
─────────────────────┼───────────────────────────────────
             Arizona │         12        8.00        8.00
          California │          6        4.00       12.00
            Colorado │         57       38.00       50.00
              Kansas │         21       14.00       64.00
            Nebraska │         10        6.67       70.67
          New Mexico │         21       14.00       84.67
            Oklahoma │          3        2.00       86.67
               Texas │          5        3.33       90.00
                Utah │         13        8.67       98.67
             Wyoming │          2        1.33      100.00
─────────────────────┼───────────────────────────────────
               Total │        150      100.00

Urban

. ta nonurban, m

  Non-urban │
   counties │
       flag │      Freq.     Percent        Cum.
────────────┼───────────────────────────────────
      urban │          4        2.67        2.67
  non-urban │        146       97.33      100.00
────────────┼───────────────────────────────────
      Total │        150      100.00

Outcome

. univar deaths, dec(0)
                                        ────────────── Quantiles ──────────────
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
───────────────────────────────────────────────────────────────────────────────
  deaths     150       43      136        0        0        1       12     1311
───────────────────────────────────────────────────────────────────────────────

. * hist deaths, width(10)

SAR models

Largely based on Stata’s SP manual.

Creating the Stata-format shapefile

. spshape2dta cb_2018_us_county_20m_prep_sample.shp, replace
  (importing .shp file)
  (importing .dbf file)
  (creating _ID spatial-unit id)
  (creating _CX coordinate)
  (creating _CY coordinate)

  file cb_2018_us_county_20m_prep_sample_shp.dta created
  file cb_2018_us_county_20m_prep_sample.dta     created

. * spshape2dta cb_2018_us_county_20m_prep.shp, replace
.     
. u cb_2018_us_county_20m_prep_sample, clear

. d

Contains data from cb_2018_us_county_20m_prep_sample.dta
  obs:           157                          
 vars:             5                          23 Nov 2020 11:22
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
              storage   display    value
variable name   type    format     label      variable label
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
_ID             int     %12.0g                Spatial-unit ID
_CX             double  %10.0g                x-coordinate of area centroid
_CY             double  %10.0g                y-coordinate of area centroid
fips            long    %9.0f                 fips
state_code      str2    %9s                   state_code
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Sorted by: _ID

. spset fips, modify replace
  (_shp.dta file saved)
  (data in memory saved)
  Sp dataset cb_2018_us_county_20m_prep_sample.dta
                data:  cross sectional
     spatial-unit id:  _ID (equal to fips)
         coordinates:  _CX, _CY (planar)
    linked shapefile:  cb_2018_us_county_20m_prep_sample_shp.dta

. spset, modify coordsys(latlong, miles)
  Sp dataset cb_2018_us_county_20m_prep_sample.dta
                data:  cross sectional
     spatial-unit id:  _ID (equal to fips)
         coordinates:  _CY, _CX (latitude-and-longitude, miles)
    linked shapefile:  cb_2018_us_county_20m_prep_sample_shp.dta


. sa, replace 
file cb_2018_us_county_20m_prep_sample.dta saved

Case selection

In order to test focal hypothesis only nonurban counties are used.

Counties with missing information on farmwork and poverty are also excluded since models will not run in situations in which counties exist in prepared spatial data but are excluded from regression (default Stata behaviour afaik).

. u "merged_covid_usa_prepared_extended_sample.dta" , clear

. * u "merged_covid_usa_prepared_extended.dta" , clear
. keep if nonurban
(4 observations deleted)

.     
. * stata deletes cases with missing obs
. * then the dataset doesnt match the spatial matrix
. * either fix missings or drop
. mdesc

    Variable    │     Missing          Total     Percent Missing
────────────────┼───────────────────────────────────────────────
           date │           0            146           0.00
       nonurban │           0            146           0.00
         county │           0            146           0.00
          state │           0            146           0.00
           fips │           0            146           0.00
          cases │           0            146           0.00
         deaths │           0            146           0.00
     nonenglish │           0            146           0.00
       farmwork │           4            146           2.74
      uninsured │           0            146           0.00
        poverty │           1            146           0.68
          older │           0            146           0.00
       pop_dens │           0            146           0.00
     date_case1 │           0            146           0.00
     time_case1 │           0            146           0.00
     sip_effect │          12            146           8.22
   date_case100 │          89            146          60.96
   time_case100 │           0            146           0.00
────────────────┼───────────────────────────────────────────────

. drop if mi(farmwork)
(4 observations deleted)

. drop if mi(poverty)
(1 observation deleted)

Merging covid data with the Stata-format shapefile

. merge 1:1 fips using cb_2018_us_county_20m_prep_sample

    Result                           # of obs.
    ─────────────────────────────────────────
    not matched                            16
        from master                         0  (_merge==1)
        from using                         16  (_merge==2)

    matched                               141  (_merge==3)
    ─────────────────────────────────────────

. * merge 1:1 fips using cb_2018_us_county_20m_prep
. assert _merge != 1

. keep if _merge == 3
(16 observations deleted)

. drop _merge

.     
. * grmap deaths

Fitting model with a spatial lag of the dependent variable

Using gs2sls option (instead of ml) to guard against potential heteroskedasticity.

. spmatrix create contiguity W, replace 

. 
. spregress deaths nonenglish farmwork uninsured poverty older pop_dens time_case1 time_case100, gs2sls dvarlag(W)
  (141 observations)
  (141 observations (places) used)
  (weighting matrix defines 141 places)

Spatial autoregressive model                    Number of obs     =        141
GS2SLS estimates                                Wald chi2(9)      =      88.76
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.3184

─────────────┬────────────────────────────────────────────────────────────────
      deaths │      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
deaths       │
  nonenglish │  -.7747344   2.671196    -0.29   0.772    -6.010183    4.460714
    farmwork │   1.437903   1.769458     0.81   0.416    -2.030171    4.905977
   uninsured │   .3591921   1.028018     0.35   0.727    -1.655687    2.374071
     poverty │   2.456317   1.613196     1.52   0.128    -.7054892    5.618123
       older │  -1.734486   1.758349    -0.99   0.324    -5.180786    1.711815
    pop_dens │   .3879637   .0647953     5.99   0.000     .2609672    .5149602
  time_case1 │   .6737964   .3885301     1.73   0.083    -.0877086    1.435301
time_case100 │  -.7328983   .3591166    -2.04   0.041    -1.436754   -.0290428
       _cons │  -77.00937   62.33612    -1.24   0.217    -199.1859    45.16719
─────────────┼────────────────────────────────────────────────────────────────
W            │
      deaths │   .4377769   .1956335     2.24   0.025     .0543424    .8212114
─────────────┴────────────────────────────────────────────────────────────────
Wald test of spatial terms:          chi2(1) = 5.01       Prob > chi2 = 0.0252

.     
. * estat impact
.     
. est sto m_extended_queen

. 

Sensitivity analysis using rook contiguity

. spmatrix create contiguity W, rook replace 

. 
. spregress deaths nonenglish farmwork uninsured poverty older pop_dens time_case1 time_case100, gs2sls dvarlag(W)
  (141 observations)
  (141 observations (places) used)
  (weighting matrix defines 141 places)

Spatial autoregressive model                    Number of obs     =        141
GS2SLS estimates                                Wald chi2(9)      =      89.27
                                                Prob > chi2       =     0.0000
                                                Pseudo R2         =     0.3190

─────────────┬────────────────────────────────────────────────────────────────
      deaths │      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
deaths       │
  nonenglish │  -.9039196   2.670872    -0.34   0.735    -6.138733    4.330894
    farmwork │   1.447637   1.767351     0.82   0.413    -2.016307     4.91158
   uninsured │   .3879726   1.027182     0.38   0.706    -1.625267    2.401212
     poverty │   2.523433    1.60896     1.57   0.117    -.6300706    5.676937
       older │   -1.82808   1.763444    -1.04   0.300    -5.284366    1.628207
    pop_dens │   .3871985   .0647255     5.98   0.000     .2603389    .5140582
  time_case1 │   .6675897   .3880934     1.72   0.085    -.0930594    1.428239
time_case100 │  -.7407261   .3589477    -2.06   0.039    -1.444251   -.0372015
       _cons │  -75.96434   62.27776    -1.22   0.223    -198.0265    46.09782
─────────────┼────────────────────────────────────────────────────────────────
W            │
      deaths │   .4337577   .1878628     2.31   0.021     .0655534    .8019621
─────────────┴────────────────────────────────────────────────────────────────
Wald test of spatial terms:          chi2(1) = 5.33       Prob > chi2 = 0.0209

.     
. est sto m_extended_rook

.     
. est tab m_extended_queen m_extended_rook , b(%6.3f) p(%6.3f)

─────────────┬────────────────────
    Variable │ m_ext~n   m_ext~k  
─────────────┼────────────────────
deaths       │
  nonenglish │  -0.775    -0.904  
             │   0.772     0.735  
    farmwork │   1.438     1.448  
             │   0.416     0.413  
   uninsured │   0.359     0.388  
             │   0.727     0.706  
     poverty │   2.456     2.523  
             │   0.128     0.117  
       older │  -1.734    -1.828  
             │   0.324     0.300  
    pop_dens │   0.388     0.387  
             │   0.000     0.000  
  time_case1 │   0.674     0.668  
             │   0.083     0.085  
time_case100 │  -0.733    -0.741  
             │   0.041     0.039  
       _cons │ -77.009   -75.964  
             │   0.217     0.223  
─────────────┼────────────────────
W            │
      deaths │   0.438     0.434  
             │   0.025     0.021  
─────────────┴────────────────────
                       legend: b/p