Chapter 2 Partial equilibrium trade policy analysis with structural gravity
2.1 Traditional Gravity Estimates
2.1.1 Preparing the data
If you haven’t used R before, or to be more precise, you have only fitted a few regressions without much practice on transforming and cleaning data before, check chapters 5 and 18 from Wickham and Grolemund (2016).
Please see the note from page 42 in Yotov et al. (2016). It’s a really important note, which tells us that we need to:
- Filter observations for a range of years (1986, 1990, 1994, 1998, 2002 and 2006)
- Transform some variables to logarithm scale (trade and dist) and create new variables from those in the original dataset
- Remove cases where both the exporter and the importer are the same
- Drop observations where the trade flow is zero
Provided that the datasets use 3 GB on disk, we opted to use a local SQL database (DuckDB) instead of spreadsheets or even native R files. If you are not familiar with SQL, do not worry because we provide the function yotov_data()
, so that yotov_data("ch1_application1")
goes to the database and returns the data for this exercise.
Step 1 is straightforward:
## ── Attaching packages ───────────────────────────────────────────────────────────────────────── yotover 0.0.0.9000 ──
## ✓ dplyr 1.0.2 ✓ broom 0.7.1
## ✓ tidyr 1.1.2 ✓ msm 1.6.8
## ✓ multiwayvcov 1.2.3 ✓ duckdb 0.2.1.2
## ✓ sandwich 3.0.0 ✓ ggplot2 3.3.2
## ✓ lmtest 0.9.38
## ── ✓ Local Yotov database is OK. ────────────────────────────────────────────────────────────────────────────────────
For step 2, this can be divided in parts, starting with the log transformation of trade
and dist
:
Continuing step 2, we can now create the variables \(Y_{i,t}\) and \(E_{i,t}\) that appear on the OLS model equation:
ch1_application1_2 <- ch1_application1_2 %>%
# Create Yit
group_by(exporter, year) %>%
mutate(
y = sum(trade),
log_y = log(y)
) %>%
# Create Eit
group_by(importer, year) %>%
mutate(
e = sum(trade),
log_e = log(e)
)
The OLS model with remoteness index needs both exporter and importer index, which can be created by grouping variables:
ch1_application1_2 <- ch1_application1_2 %>%
# Replicate total_e
group_by(exporter, year) %>%
mutate(total_e = sum(e)) %>%
group_by(year) %>%
mutate(total_e = max(total_e)) %>%
# Replicate rem_exp
group_by(exporter, year) %>%
mutate(
remoteness_exp = sum(dist * total_e / e),
log_remoteness_exp = log(remoteness_exp)
) %>%
# Replicate total_y
group_by(importer, year) %>%
mutate(total_y = sum(y)) %>%
group_by(year) %>%
mutate(total_y = max(total_y)) %>%
# Replicate rem_imp
group_by(importer, year) %>%
mutate(
remoteness_imp = sum(dist / (y / total_y)),
log_remoteness_imp = log(remoteness_imp)
)
To create the variables for the OLS with Fixed Effects Model, we followed box #1 in page 44 from Yotov et al. (2016):
ch1_application1_2 <- ch1_application1_2 %>%
# This merges the columns exporter/importer with year
mutate(
exp_year = paste0(exporter, year),
imp_year = paste0(importer, year)
)
This concludes step 2.
Now we need to perform step 3:
Step 4 is used in some cases and we will be explicit about it.
2.1.2 OLS estimation ignoring multilateral resistance terms
The general equation for this model is: \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \:\beta_6 \log E_{j,t} + \varepsilon_{ij,t} \end{align} \]
See page 41 in Yotov et al. (2016) for a full detail of each variable.
The model for this case is straightforward, and in this case we need to apply step 4 from thje previous section to drop cases where the trade is zero:
fit_ols <- lm(
log_trade ~ log_dist + cntg + lang + clny + log_y + log_e,
data = ch1_application1_2 %>%
filter(trade > 0)
)
summary(fit_ols)
##
## Call:
## lm(formula = log_trade ~ log_dist + cntg + lang + clny + log_y +
## log_e, data = ch1_application1_2 %>% filter(trade > 0))
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5421 -0.8281 0.1578 1.0476 7.6585
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.283080 0.151732 -74.36 < 2e-16 ***
## log_dist -1.001607 0.014159 -70.74 < 2e-16 ***
## cntg 0.573805 0.074427 7.71 1.31e-14 ***
## lang 0.801548 0.033748 23.75 < 2e-16 ***
## clny 0.734853 0.070387 10.44 < 2e-16 ***
## log_y 1.190236 0.005402 220.32 < 2e-16 ***
## log_e 0.907588 0.005577 162.73 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.743 on 25682 degrees of freedom
## Multiple R-squared: 0.7585, Adjusted R-squared: 0.7585
## F-statistic: 1.345e+04 on 6 and 25682 DF, p-value: < 2.2e-16
Now the model is almost ready! Now we only need to stick to the methodology from Yotov et al. (2016) and cluster the standard errors by country pair (see the note in page 42, it is extremely important). This is not straightforward and requires additional work.
The yotover
package provides a nice function to do this and more. Please read
the documentation of the package and look the yotov_model_summary()
function, it
summarises the model in the exact way as reported in the book by providing:
- Clustered standard errors
- Number of observations
- \(R^2\) (if applicable)
- Presence (or absence) of exporter and exporter time fixed effects
- RESET test p-value
This is returned as a list to keep it simple.
Finally, here is the model as reported in the book:
yotov_model_summary(
formula = "log_trade ~ log_dist + cntg + lang + clny + log_y + log_e",
data = filter(ch1_application1_2, trade > 0),
method = "lm"
)
## $tidy_coefficients
## # A tibble: 7 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -11.3 0.296 -38.1 1.17e-309
## 2 log_dist -1.00 0.0273 -36.6 1.85e-286
## 3 cntg 0.574 0.185 3.11 1.90e- 3
## 4 lang 0.802 0.0821 9.76 1.78e- 22
## 5 clny 0.735 0.144 5.10 3.49e- 7
## 6 log_y 1.19 0.00946 126. 0.
## 7 log_e 0.908 0.00991 91.6 0.
##
## $nobs
## [1] 25689
##
## $rsquared
## [1] 0.7585251
##
## $etfe
## [1] FALSE
##
## $itfe
## [1] FALSE
##
## $reset_pval
## [1] 4.346285e-15
Please notice that the summary hides the exporter/importer fixed effects.
2.1.3 OLS estimation controlling for multilateral resistance terms with remote indexes
The remoteness model adds variables to the OLS model. The general equation for this model is: \[ \begin{align} \log X_{ij,t} =& \:\beta_0 + \beta_1 DIST_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j} + \beta_5 \log Y_{i,t} +\\ \text{ }& \beta_6 \log E_{j,t} + \beta_7 \log(REM\_EXP_i,t) + \beta_8 \log(REM\_IMP_i,t) + \varepsilon_{ij,t} \end{align} \]
Where \[ \log(REM\_EXP_{i,t}) = \log \left( \sum_j \frac{DIST_{i,j}}{E_{j,t} / Y_t} \right)\\ \log(REM\_IMP_{i,t}) = \log \left( \sum_i \frac{DIST_{i,j}}{E_{i,t} / Y_t} \right) \]
See page 43 in Yotov et al. (2016) for a full detail of each variable.
We can start from the dataset for the OLS model, and add the additional variables to it. Our ch1_approach follows box #1 in page 43 from Yotov et al. (2016):
Fitting the regression is straightforward, it’s just about adding more regressors to what we did in the last section, and we can create a list with a summary for the model:
yotov_model_summary(
formula = "log_trade ~ log_dist + cntg + lang + clny + log_y + log_e +
log_remoteness_exp + log_remoteness_imp",
data = filter(ch1_application1_2, trade > 0),
method = "lm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -35.2 1.99 -17.7 6.28e- 70
## 2 log_dist -1.18 0.0313 -37.9 9.17e-306
## 3 cntg 0.247 0.177 1.39 1.63e- 1
## 4 lang 0.739 0.0784 9.43 4.48e- 21
## 5 clny 0.842 0.150 5.61 2.08e- 8
## 6 log_y 1.16 0.00948 123. 0.
## 7 log_e 0.903 0.00991 91.1 0.
## 8 log_remoteness_exp 0.972 0.0682 14.3 6.66e- 46
## 9 log_remoteness_imp 0.274 0.0598 4.58 4.71e- 6
##
## $nobs
## [1] 25689
##
## $rsquared
## [1] 0.765028
##
## $etfe
## [1] FALSE
##
## $itfe
## [1] FALSE
##
## $reset_pval
## [1] 7.672904e-14
2.1.4 OLS estimation controlling for multilateral resistance terms with fixed effects
The general equation for this model is: \[ \begin{align} \log X_{ij,t} =& \:\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\\ \text{ }& \:\beta_4 CLNY_{i,j} + \varepsilon_{ij,t} \end{align} \]
Where the added terms, with respect to the OLS model, are \(\pi_{i,t}\) and \(\chi_{i,t}\) that account for exporter-time and importer-time fixed effects respectively. See page 44 in Yotov et al. (2016) for a full detail of each variable.
We can start from the dataset for the OLS model, and add the additional variables to it. In this case we take the OLS dataset and combine both exporter and importer variables with the year in order to create the fixed effects variables.
Now we can easily generate a list as we did with the previous models:
yotov_model_summary(
formula = "log_trade ~ log_dist + cntg + lang + clny + exp_year + imp_year",
data = filter(ch1_application1_2, trade > 0),
method = "lm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 13.1 0.508 25.7 4.44e-144
## 2 log_dist -1.22 0.0382 -31.8 3.77e-218
## 3 cntg 0.223 0.203 1.10 2.71e- 1
## 4 lang 0.661 0.0821 8.05 8.41e- 16
## 5 clny 0.670 0.149 4.49 7.24e- 6
##
## $nobs
## [1] 25689
##
## $rsquared
## [1] 0.8432398
##
## $etfe
## [1] TRUE
##
## $itfe
## [1] TRUE
##
## $reset_pval
## [1] 2.473022e-231
2.1.5 PPML estimation controlling for multilateral resistance terms with fixed effects
The general equation for this model is:
\[ \begin{align} X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} +\right.\\ \text{ }& \:\left.\beta_3 LANG_{i,j} + \beta_4 CLNY_{i,j}\right] \times \varepsilon_{ij,t} \end{align} \]
The reason to compute this model even in spite of speed is that PPML is the only estimator that is perfectly consistent with the theoretical gravity model. By estimating with PPML the fixed effects correspond exactly to the corresponding theoretical terms.
The data for this model is exactly the same as for the fixed effects model.
One option in R is to use the glm()
function and a quasi-poisson family to avoid overdispersion problems:
fit_ppml <- glm(trade ~ log_dist + cntg + lang + clny + exp_year + imp_year,
family = quasipoisson(link = "log"),
data = ch1_application1_2
)
If you decide to run this model and print the summary yourself, you’ll notice that it doesn’t report \(R^2\) and that it shows a large list of fixed effects. The \(R^2\) needs to be computed afterwards as a function of the correlation between the observed and predicted values. Please see Silva and Tenreyro (2006) for the details as well as for the RESET test for PPML (GLM) models.
Software such as Stata, without dedicated functions, reports an incorrect \(R^2\) for PPML model, it actually reports a pseudo-\(R^2\). To construct a proper \(R^2\), yotov_model_summary()
takes the correlation between actual and predicted trade flows.
We can obtain a detailed list as in the previous examples:
yotov_model_summary(
formula = "trade ~ log_dist + cntg + lang + clny + exp_year + imp_year",
data = ch1_application1_2,
method = "glm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 10.4 0.459 22.7 5.51e-114
## 2 log_dist -0.841 0.0321 -26.2 5.66e-151
## 3 cntg 0.437 0.0844 5.18 2.19e- 7
## 4 lang 0.247 0.0777 3.19 1.44e- 3
## 5 clny -0.222 0.118 -1.89 5.93e- 2
##
## $nobs
## [1] 28152
##
## $rsquared
## [1] 0.5859927
##
## $etfe
## [1] TRUE
##
## $itfe
## [1] TRUE
##
## $reset_pval
## [1] 0.6415408
Please notice that the previous summary intentionally doesn’t show time exporter/importer fixed effects.
2.2 The “distance puzzle” resolved
2.2.1 Preparing the data
Please see the note from page 47 in Yotov et al. (2016). We need to proceed with similar steps as in the previous section.
Unlike the previous section, we will create different tables for OLS, PPML and its variations, because the solution for the “distance puzzle” implies different transformations and filters for each case.
The distance puzzle proposes this gravity specification:
\[
\begin{align}
X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j}\right]\times\\
\text{ }& \:\exp\left[\beta_4 CLNY_{i,j} + \beta_5 \log(DIST\_INTRA_{i,i})\right] \times \varepsilon_{ij,t}
\end{align}
\]
The difference with respect to the last section is that now we need to separate the log_dist
variable into multiple columns that account for discrete time effects. This is expressed into the \(\beta_T\) terms of the equation. Perhaps the easiest option to do this is to transform year
into a text column and then use the spread()
function.
For the OLS model we need to remove cases where the exporter is the same as the importer and cases where trade is zero. For the PPML models we need to mark rows where the exporter and the importer are the same, and we need to create the smctry
column, which is also required to transform the log_dist_*
variables as shown in box #1 in page 48 from Yotov et al. (2016):
In order to avoid creating two datasets that are very similar, we shall create one dataset to cover both OLS and PPML:
ch1_application2_2 <- yotov_data("ch1_application2") %>%
# this filter covers both OLS and PPML
filter(year %in% seq(1986, 2006, 4)) %>%
mutate(
# variables for both OLS and PPML
exp_year = paste0(exporter, year),
imp_year = paste0(importer, year),
year = paste0("log_dist_", year),
log_trade = log(trade),
log_dist = log(dist),
# PPML specific variables
smctry = ifelse(importer != exporter, 0, 1),
log_dist_intra = log_dist * smctry,
intra_pair = ifelse(exporter == importer, exporter, "inter")
) %>%
spread(year, log_dist, fill = 0) %>%
mutate(across(log_dist_1986:log_dist_2006, ~ .x * (1 - smctry)))
Here the across()
function is a shortcut to avoid writing something like:
ch1_application2_2 %>%
mutate(
log_dist_1986 = log_dist_1986 * (1 - smctry),
log_dist_1990 = log_dist_1990 * (1 - smctry),
... REPEAT log_dist_T many_times ....
log_dist_2006 = log_dist_2006 * (1 - smctry)
)
Also notice that the OLS model shall require filtering when we specify the model, because we skipped filtering the cases where trade is equal to zero and both the importer and the exporter are the same.
2.2.2 OLS solution for the “distance puzzle”
The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.
With the data from above, the model specification is straightforward:
yotov_model_summary2(
formula = "log_trade ~ 0 + log_dist_1986 + log_dist_1990 + log_dist_1994 +
log_dist_1998 + log_dist_2002 + log_dist_2006 + cntg +
lang + clny + exp_year + imp_year",
data = filter(ch1_application2_2, importer != exporter, trade > 0),
method = "lm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist_1986 -1.17 0.0436 -26.8 9.33e-156
## 2 log_dist_1990 -1.16 0.0423 -27.3 1.10e-161
## 3 log_dist_1994 -1.21 0.0457 -26.5 1.07e-152
## 4 log_dist_1998 -1.25 0.0428 -29.2 4.14e-184
## 5 log_dist_2002 -1.24 0.0441 -28.1 1.34e-171
## 6 log_dist_2006 -1.26 0.0437 -28.9 4.02e-180
## 7 cntg 0.223 0.203 1.10 2.71e- 1
## 8 lang 0.661 0.0821 8.06 8.20e- 16
## 9 clny 0.670 0.149 4.49 7.25e- 6
##
## $nobs
## [1] 25689
##
## $pct_chg_log_dist
## [1] 7.950156
##
## $pcld_std_err
## [1] 3.75886
##
## $pcld_std_err_pval
## [1] 0.03442616
##
## $intr
## [1] FALSE
##
## $csfe
## [1] FALSE
Notice that, unlike the previous section, we used the notation y ~ 0 + ...
. The zero means not to include a constant.
2.2.3 PPML solution for the “distance puzzle”
This model is very similar to the one specified in the PPML section from the last section. We can fit the model in a direct way:
yotov_model_summary2(
formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 +
log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
cntg + lang + clny + exp_year + imp_year",
data = filter(ch1_application2_2, importer != exporter),
method = "lm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist_1986 245. 205. 1.19 2.34e- 1
## 2 log_dist_1990 -167. 168. -0.994 3.20e- 1
## 3 log_dist_1994 -423. 139. -3.03 2.41e- 3
## 4 log_dist_1998 -757. 140. -5.41 6.50e- 8
## 5 log_dist_2002 -1018. 182. -5.58 2.41e- 8
## 6 log_dist_2006 -2221. 333. -6.67 2.55e-11
## 7 cntg 6949. 2425. 2.87 4.17e- 3
## 8 lang -57.4 304. -0.189 8.50e- 1
## 9 clny -744. 696. -1.07 2.85e- 1
##
## $nobs
## [1] 28152
##
## $pct_chg_log_dist
## [1] -1008.222
##
## $pcld_std_err
## [1] 747.5644
##
## $pcld_std_err_pval
## [1] 0.1774412
##
## $intr
## [1] FALSE
##
## $csfe
## [1] FALSE
2.2.4 Internal distance solution for the “distance puzzle”
This model just requires us to add the log_dist_intra
variable to the PPML model and not to filter the rows where the exporter and the importer are the same:
yotov_model_summary2(
formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 +
log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
cntg + lang + clny + exp_year + imp_year + log_dist_intra",
data = ch1_application2_2,
method = "glm"
)
## $tidy_coefficients
## # A tibble: 10 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist_1986 -0.980 0.0731 -13.4 5.66e-41
## 2 log_dist_1990 -0.940 0.0742 -12.7 8.96e-37
## 3 log_dist_1994 -0.915 0.0731 -12.5 6.06e-36
## 4 log_dist_1998 -0.887 0.0721 -12.3 9.16e-35
## 5 log_dist_2002 -0.884 0.0717 -12.3 6.12e-35
## 6 log_dist_2006 -0.872 0.0724 -12.1 1.85e-33
## 7 cntg 0.371 0.142 2.62 8.76e- 3
## 8 lang 0.337 0.171 1.98 4.81e- 2
## 9 clny 0.0192 0.159 0.121 9.04e- 1
## 10 log_dist_intra -0.488 0.102 -4.78 1.76e- 6
##
## $nobs
## [1] 28566
##
## $pct_chg_log_dist
## [1] -10.96483
##
## $pcld_std_err
## [1] 1.073772
##
## $pcld_std_err_pval
## [1] 8.805433e-25
##
## $intr
## [1] TRUE
##
## $csfe
## [1] FALSE
2.2.5 Internal distance and home bias solution for the “distance puzzle”
This model just requires us to add the smctry
variable to the internal distance model and repeat the rest of the steps from the last section:
yotov_model_summary2(
formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 +
log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
cntg + lang + clny + exp_year + imp_year + log_dist_intra + smctry",
data = ch1_application2_2,
method = "glm"
)
## $tidy_coefficients
## # A tibble: 11 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist_1986 -0.857 0.0636 -13.5 2.12e-41
## 2 log_dist_1990 -0.819 0.0642 -12.7 3.15e-37
## 3 log_dist_1994 -0.796 0.0644 -12.4 4.84e-35
## 4 log_dist_1998 -0.770 0.0639 -12.0 2.17e-33
## 5 log_dist_2002 -0.767 0.0636 -12.1 1.69e-33
## 6 log_dist_2006 -0.754 0.0631 -12.0 6.42e-33
## 7 cntg 0.574 0.157 3.65 2.67e- 4
## 8 lang 0.352 0.139 2.53 1.13e- 2
## 9 clny 0.0269 0.127 0.212 8.32e- 1
## 10 log_dist_intra -0.602 0.111 -5.44 5.41e- 8
## 11 smctry 1.69 0.582 2.90 3.72e- 3
##
## $nobs
## [1] 28566
##
## $pct_chg_log_dist
## [1] -11.96934
##
## $pcld_std_err
## [1] 1.190509
##
## $pcld_std_err_pval
## [1] 4.412162e-24
##
## $intr
## [1] TRUE
##
## $csfe
## [1] FALSE
2.2.6 Fixed effects solution for the “distance puzzle”
This model just requires us to remove the variables log_dist_intra
and smctry
from the last model and include the intra_pair
variable to account for the intra-national fixed effects:
yotov_model_summary2(
formula = "trade ~ 0 + log_dist_1986 + log_dist_1990 +
log_dist_1994 + log_dist_1998 + log_dist_2002 + log_dist_2006 +
cntg + lang + clny + exp_year + imp_year + intra_pair",
data = ch1_application2_2,
method = "glm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist_1986 -0.910 0.0328 -27.7 2.26e-169
## 2 log_dist_1990 -0.879 0.0326 -27.0 4.97e-160
## 3 log_dist_1994 -0.860 0.0324 -26.6 1.32e-155
## 4 log_dist_1998 -0.833 0.0322 -25.9 8.22e-148
## 5 log_dist_2002 -0.829 0.0325 -25.5 2.41e-143
## 6 log_dist_2006 -0.811 0.0325 -24.9 4.67e-137
## 7 cntg 0.442 0.0830 5.33 9.87e- 8
## 8 lang 0.241 0.0772 3.11 1.84e- 3
## 9 clny -0.220 0.118 -1.86 6.27e- 2
##
## $nobs
## [1] 28566
##
## $pct_chg_log_dist
## [1] -10.93109
##
## $pcld_std_err
## [1] 0.7811861
##
## $pcld_std_err_pval
## [1] 8.607967e-45
##
## $intr
## [1] TRUE
##
## $csfe
## [1] TRUE
2.3 Regional trade agreements effects
2.3.1 Preparing the data
This model specification includes gravity covariates, including both importer and exporter time fixed effects:
\[
\begin{align}
X_{ij,t} =& \:\exp\left[\pi_{i,t} + \chi_{i,t} + \beta_1 \log(DIST)_{i,j} + \beta_2 CNTG_{i,j} + \beta_3 LANG_{i,j} +\right.\\
\text{ }& \:\left.\beta_4 CLNY_{i,j} + \beta_5 RTA_{ij,t}\right] \times \varepsilon_{ij,t}
\end{align}
\]
We need to create additional variables, in comparison to the previous examples, to include fixed effects that account for the observations where the exporter and the importer are the same. These variables are intl_brdr
, pair_id_2
and the columns of the form intl_border_Y
where Y corresponds to the year.
The direct way of obtaining the desired variables is quite similar to what we did in the previous sections:
ch1_application3_2 <- yotov_data("ch1_application3") %>%
filter(year %in% seq(1986, 2006, 4)) %>%
mutate(
exp_year = paste0(exporter, year),
imp_year = paste0(importer, year),
year = paste0("intl_border_", year),
log_trade = log(trade),
log_dist = log(dist),
intl_brdr = ifelse(exporter == importer, pair_id, "inter"),
intl_brdr_2 = ifelse(exporter == importer, 0, 1),
pair_id_2 = ifelse(exporter == importer, "0-intra", pair_id)
) %>%
spread(year, intl_brdr_2, fill = 0)
Notice that we used 0-intra
and not just intra
. This is because the rest of the observations in pair_id_2
are numbers 1,…,N, and R internals shall consider 0-intra
as the reference factor for being the first item when it orders the unique observations alphabetically. This makes the difference between the expected behavior or any behavior in the next chapter.
In addition, we need to create the variable sum_trade
to filter the cases where the sum by pair_id
is zero:
2.3.2 OLS standard RTA estimates with international trade only
The gravity specification, which includes \(\pi_{i,t} + \chi_{i,t}\), means that we need to do something very similar to what we did in the last section.
With the data from above, the model specification is straightforward:
yotov_model_summary3(
formula = "log_trade ~ 0 + log_dist + cntg + lang + clny +
rta + exp_year + imp_year",
data = filter(ch1_application3_2, trade > 0, importer != exporter),
method = "lm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist -1.22 0.0390 -31.2 1.92e-209
## 2 cntg 0.223 0.203 1.10 2.72e- 1
## 3 lang 0.661 0.0821 8.05 8.94e- 16
## 4 clny 0.670 0.149 4.49 7.24e- 6
## 5 rta -0.00439 0.0540 -0.0813 9.35e- 1
##
## $nobs
## [1] 25689
##
## $total_rta_effect
## [1] -0.004389628
##
## $trta_std_err
## [1] 0.05398407
##
## $trta_std_err_pval
## [1] 0.9351927
##
## $intr
## [1] FALSE
2.3.3 PPML standard RTA estimates with international trade only
The model specification is very similar to OLS and we only need to change the function lm()
:
yotov_model_summary3(
formula = "trade ~ 0 + log_dist + cntg + lang + clny +
rta + exp_year + imp_year",
data = filter(ch1_application3_2, importer != exporter),
method = "glm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist -0.822 0.0314 -26.1 1.80e-150
## 2 cntg 0.416 0.0840 4.94 7.63e- 7
## 3 lang 0.250 0.0778 3.21 1.32e- 3
## 4 clny -0.205 0.116 -1.77 7.66e- 2
## 5 rta 0.191 0.0668 2.86 4.29e- 3
##
## $nobs
## [1] 28152
##
## $total_rta_effect
## [1] 0.1907176
##
## $trta_std_err
## [1] 0.06678383
##
## $trta_std_err_pval
## [1] 0.004293603
##
## $intr
## [1] FALSE
2.3.4 Addressing potential domestic trade diversion
The model specification is quite the same as PPML and we only need to add the variable intl_brdr
but using the full dataset instead of removing rows where the importer and the exporter are the same:
yotov_model_summary3(
formula = "trade ~ 0 + log_dist + cntg + lang + clny +
rta + exp_year + imp_year + intl_brdr",
data = ch1_application3_2,
method = "glm"
)
## $tidy_coefficients
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 log_dist -0.800 0.0307 -26.0 2.45e-149
## 2 cntg 0.393 0.0801 4.91 9.27e- 7
## 3 lang 0.244 0.0783 3.11 1.86e- 3
## 4 clny -0.182 0.115 -1.58 1.14e- 1
## 5 rta 0.409 0.0699 5.84 5.16e- 9
##
## $nobs
## [1] 28566
##
## $total_rta_effect
## [1] 0.4085219
##
## $trta_std_err
## [1] 0.06993087
##
## $trta_std_err_pval
## [1] 5.164081e-09
##
## $intr
## [1] TRUE
2.3.5 Addressing potential endogeneity of RTAs
The model specification consists in including the rta
variable and the fixed effects exp_year
, imp_year
and pair_id_2
to account for domestic trade:
yotov_model_summary3(
formula = "trade ~ 0 + rta + exp_year + imp_year + pair_id_2",
data = filter(ch1_application3_2, sum_trade > 0),
method = "glm"
)
## $tidy_coefficients
## # A tibble: 1 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 rta 0.557 0.0143 39.0 1.20e-322
##
## $nobs
## [1] 28482
##
## $total_rta_effect
## [1] 0.5571853
##
## $trta_std_err
## [1] 0.1084293
##
## $trta_std_err_pval
## [1] 2.766506e-07
##
## $intr
## [1] TRUE
2.3.6 Testing for potential “reverse causality” between trade and RTAs
We need to modify the previous model in order to include the variable rta_lead4
and to consider where sum_trade
is greater than zero:
yotov_model_summary3(
formula = "trade ~ 0 + rta + rta_lead4 + exp_year + imp_year + pair_id_2",
data = filter(ch1_application3_2, sum_trade > 0),
method = "glm"
)
## $tidy_coefficients
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 rta 0.520 0.0911 5.71 0.0000000113
## 2 rta_lead4 0.0774 0.0977 0.793 0.428
##
## $nobs
## [1] 28482
##
## $total_rta_effect
## [1] 0.5974906
##
## $trta_std_err
## [1] 0.1463918
##
## $trta_std_err_pval
## [1] 4.475609e-05
##
## $intr
## [1] TRUE
2.3.7 Addressing potential non-linear and phasing-in effects of RTAs
Instead of future-lagged rta
variable, as in the previous model, we modify the previous model and include the rta_lagN
past-lagged variables instead:
yotov_model_summary3(
formula = "trade ~ 0 + rta + rta_lag4 + rta_lag8 + rta_lag12 +
exp_year + imp_year + pair_id_2",
data = filter(ch1_application3_2, sum_trade > 0),
method = "glm"
)
## $tidy_coefficients
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 rta 0.291 0.0946 3.08 0.00206
## 2 rta_lag4 0.414 0.0713 5.80 0.00000000671
## 3 rta_lag8 0.169 0.0458 3.69 0.000226
## 4 rta_lag12 0.119 0.0319 3.73 0.000192
##
## $nobs
## [1] 28482
##
## $total_rta_effect
## [1] 0.9926189
##
## $trta_std_err
## [1] 0.0999689
##
## $trta_std_err_pval
## [1] 1.552499e-23
##
## $intr
## [1] TRUE
2.3.8 Addressing globalization effects
Just as an addition to the previous model, we include the variables intl_border_T
variables in addition to rta_lagN
:
yotov_model_summary3(
formula = "trade ~ 0 + rta + rta_lag4 + rta_lag8 + rta_lag12 +
intl_border_1986 + intl_border_1990 + intl_border_1994 +
intl_border_1998 + intl_border_2002 +
exp_year + imp_year + pair_id_2",
data = filter(ch1_application3_2, sum_trade > 0),
method = "glm"
)
## $tidy_coefficients
## # A tibble: 9 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 rta 0.116 0.0920 1.26 2.08e- 1
## 2 rta_lag4 0.288 0.0654 4.40 1.09e- 5
## 3 rta_lag8 0.0693 0.0511 1.36 1.75e- 1
## 4 rta_lag12 0.00236 0.0309 0.0763 9.39e- 1
## 5 intl_border_1986 -0.706 0.0507 -13.9 4.88e-44
## 6 intl_border_1990 -0.480 0.0456 -10.5 5.50e-26
## 7 intl_border_1994 -0.367 0.0355 -10.3 5.29e-25
## 8 intl_border_1998 -0.158 0.0247 -6.41 1.50e-10
## 9 intl_border_2002 -0.141 0.0179 -7.87 3.62e-15
##
## $nobs
## [1] 28482
##
## $total_rta_effect
## [1] 0.4750185
##
## $trta_std_err
## [1] 0.1161348
##
## $trta_std_err_pval
## [1] 4.309359e-05
##
## $intr
## [1] TRUE
References
Silva, JMC Santos, and Silvana Tenreyro. 2006. “The Log of Gravity.” The Review of Economics and Statistics 88 (4): 641–58.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.".
Yotov, Yoto V, Roberta Piermartini, José-Antonio Monteiro, and Mario Larch. 2016. An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. World Trade Organization Geneva.