15-ch15.Rmd

# Estimation of Dynamic Causal Effects {#eodce}

It sometimes is of interest to know the size of current and future reaction of $Y$ to a change in $X$. This is called the *dynamic causal effect* on $Y$ of a change in $X$. This Chapter we discusses how to estimate dynamic causal effects in `r ttcode("R")` applications, where we investigate the dynamic effect of cold weather in Florida on the price of orange juice concentrate.

The discussion covers:

+ estimation of distributed lag models
+ heteroskedasticity- and autocorrelation-consistent (HAC) standard errors
+ generalized least squares (GLS) estimation of ADL models

To reproduce code examples, install the `r ttcode("R")` packages listed below beforehand and make sure that the subsequent code chunk executes without any errors. 

+ `r ttcode("AER")` [@R-AER]
+ `r ttcode("dynlm")` [@R-dynlm]
+ `r ttcode("nlme")` [@R-nlme]
+ `r ttcode("orcutt")` [@R-orcutt]
+ `r ttcode("quantmod")` [@R-quantmod]
+ `r ttcode("stargazer")` [@R-stargazer]

```{r, warning=FALSE, message=FALSE}
library(AER)
library(quantmod)
library(dynlm)
library(orcutt)
library(nlme)
library(stargazer)
```

## The Orange Juice Data 

The largest cultivation region for oranges in the U.S. is located in Florida which usually has ideal climate for the fruit growth. It thus is the source of almost all frozen juice concentrate produced in the country. However, from time to time and depending on their severeness, cold snaps cause a loss of harvests such that the supply of oranges decreases and consequently the price of frozen juice concentrate rises. The timing of the price increases is complicated: a cut in today's supply of concentrate influences not just today's price but also future prices because supply in future periods will decrease, too. Clearly, the magnitude of today's and future prices increases due to freeze is an empirical question that can be investigated using a distributed lag model --- a time series model that relates price changes to weather conditions.

To begin with the analysis, we reproduce Figure 15.1 of the book which displays plots of the price index for frozen concentrated orange juice, percentage changes in the price as well as monthly freezing degree days in Orlando, the center of Florida's orange-growing region.

```{r}
# load the frozen orange juice data set
data("FrozenJuice")

# compute the price index for frozen concentrated juice
FOJCPI <- FrozenJuice[, "price"]/FrozenJuice[, "ppi"]
FOJC_pctc <- 100 * diff(log(FOJCPI))
FDD <- FrozenJuice[, "fdd"]
```

```{r, fig.align='center'}
# convert series to xts objects
FOJCPI_xts <- as.xts(FOJCPI)
FDD_xts <- as.xts(FrozenJuice[, 3])

# Plot orange juice price index
plot(as.zoo(FOJCPI),
     col = "steelblue", 
     lwd = 2,
     xlab = "Date",
     ylab = "Price index", 
     main = "Frozen Concentrated Orange Juice")
```

```{r, fig.align='center', fig.height=7}
# divide plotting area
par(mfrow = c(2, 1))

# Plot percentage changes in prices
plot(as.zoo(FOJC_pctc),
     col = "steelblue", 
     lwd = 2,
     xlab = "Date",
     ylab = "Percent",
     main = "Monthly Changes in the Price of Frozen Conentrated Orange Juice")

# plot freezing degree days
plot(as.zoo(FDD),
     col = "steelblue", 
     lwd = 2,
     xlab = "Date",
     ylab = "Freezing degree days",
     main = "Monthly Freezing Degree Days in Orlando, FL")
```

Periods with a high amount of freezing degree days are followed by large month-to-month price changes. These coinciding movements motivate a simple regression of price changes ($\%ChgOJC_t$) on freezing degree days ($FDD_t$) to estimate the effect of an additional freezing degree day one the price in the current month. For this, as for all other regressions in this chapter, we use $T=611$ observations (January 1950 to December 2000).

```{r}
# simple regression of percentage changes on freezing degree days
orange_SR <- dynlm(FOJC_pctc ~ FDD)
coeftest(orange_SR, vcov. = vcovHAC)
```

Notice that the standard errors are computed using a "HAC" estimator of the variance-covariance matrix, see Chapter \@ref(apatadlm) for a discussion of this estimator.

\begin{align*}
  \widehat{\%ChgOJC_t} = -\underset{(0.19)}{0.42} + \underset{(0.13)}{0.47} FDD_t
\end{align*}

The estimated coefficient on $FDD_t$ has the following interpretation: an additional freezing degree day in month $t$ leads to a price increase 0f $0.47$ percentage points in the same month.  

To consider effects of cold snaps on the orange juice price over the subsequent periods, we include lagged values of $FDD_t$ in our model which leads to a *distributed lag regression model*. We estimate a specification using a contemporaneous and six lagged values of $FDD_t$ as regressors.

```{r}
# distributed lag model with 6 lags of freezing degree days
orange_DLM <- dynlm(FOJC_pctc ~ FDD + L(FDD, 1:6))
coeftest(orange_DLM, vcov. = vcovHAC)
```

As the result we obtain

\begin{align}
  \begin{split}
  \widehat{\%ChgOJC_t} =& -\underset{(0.21)}{0.69} + \underset{(0.14)}{0.47} FDD_t + \underset{(0.08)}{0.15} FDD_{t-1} + \underset{(0.06)}{0.06} FDD_{t-2} + \underset{(0.05)}{0.07} FDD_{t-3} \\ &+ \underset{(0.03)}{0.04} FDD_{t-4} + \underset{(0.03)}{0.05} FDD_{t-5} + \underset{(0.05)}{0.05} FDD_{t-6},
  \end{split}
  (\#eq:orangemod1)
\end{align}

where the coefficient on $FDD_{t-1}$ estimates the price increase in period $t$ caused by an additional freezing degree day in the preceding month, the coefficient on $FDD_{t-2}$ estimates the effect of an additional freezing degree day two month ago and so on. Consequently, the coefficients in \@ref(eq:orangemod1) can be interpreted as price changes in current and future periods due to a unit increase in the current month' freezing degree days.

## Dynamic Causal Effects

This section of the book describes the general idea of a dynamic causal effect and how the concept of a randomized controlled experiment can be translated to time series applications, using several examples. 

In general, for empirical attempts to measure a dynamic causal effect, the assumptions of stationarity (see Key Concept 14.5) and exogeneity must hold. In time series applications up until here we have assumed that the model error term has conditional mean zero given current and past values of the regressors. For estimation of a dynamic causal effect using a distributed lag model, assuming a stronger form termed *strict exogeneity* may be useful. Strict exogeneity states that the error term has mean zero conditional on past, present *and future* values of the independent variables. 

The two concepts of exogeneity and the distributed lag model are summarized in Key Concept 15.1.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC15.1">
<h3 class = "right"> Key Concept 15.1 </h3>
<h3 class = "left"> The Distributed Lag Model and Exogeneity </h3>
<p>
The general distributed lag model is
\\begin{align}
  Y_t = \\beta_0 + \\beta_1 X_t + \\beta_2 X_{t-1} + \\beta_3 X_{t-2} + \\dots + \\beta_{r+1} X_{t-r} + u_t, (\\#eq:dlm)
\\end{align}
where it is assumed that

1. $X$ is an exogenous variable, $$E(u_t\\vert X_t, X_{t-1}, X_{t-2},\\dots) = 0.$$

2.
     + a $X_t,Y_t$ have a stationary distribution.
     + b $(Y_t,X_t)$ and $(Y_{t-j},X_{t-j})$ become independently distributed as $j$ gets large.

3. Large outliers are unlikely. In particular, we need that all the variables have more than eight nonzero and finite moments --- a stronger assumption than before (four finite nonzero moments) that is required for computation of the HAC covariance matrix estimator.

4. There is no perfect multicollinearity.

 The distributed lag model may be extended to include contemporaneous and past values of additional regressors.

**On the assumption of exogeneity**

+ There is another form of exogeneity termed *strict exogeneity* which assumes $$E(u_t\\vert \\dots, X_{t+2},X_{t+1},X_t,X_{t-1},X_{t-2},\\dots)=0,$$ that is the error term has mean zero conditional on past, present and future values of $X$. Strict exogeneity implies exogeneity (as defined in 1. above) but not the other way around. From this point on we will therefore distinguish between exogeneity and strict exogeneity. 

+ Exogeneity as in 1. suffices for the OLS estimators of the coefficient in distributed lag models to be consistent. However, if the the assumption of strict exogeneity is valid, more efficient estimators can be applied.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The Distributed Lag Model and Exogeneity]{15.1}
The general distributed lag model is
\\begin{align}
  Y_t = \\beta_0 + \\beta_1 X_t + \\beta_2 X_{t-1} + \\beta_3 X_{t-2} + \\dots + \\beta_{r+1} X_{t-r} + u_t, \\label{eq:dlm}
\\end{align}
where it is assumed that \\newline

\\begin{enumerate}
\\item $X$ is an exogenous variable, $$E(u_t\\vert X_t, X_{t-1}, X_{t-2},\\dots) = 0.$$
\\item \\vspace{0.5cm}
\\begin{itemize}
\\item[(a)] $X_t,Y_t$ have a stationary distribution.
\\item[(b)] $(Y_t,X_t)$ and $(Y_{t-j},X_{t-j})$ become independently distributed as $j$ gets large.
\\end{itemize}\\vspace{0.5cm}
\\item Large outliers are unlikely. In particular, we need that all the variables have more than eight finite moments --- a stronger assumption than before (four finite moments) that is required for computation of the HAC covariance matrix estimator.
\\item There is no perfect multicollinearity.\\vspace{0.5cm}
\\end{enumerate}\\vspace{0.5cm}

The distributed lag model may be extended to include contemporaneous and past values of additional regressors.\\vspace{0.5cm}

\\textbf{On the assumption of exogeneity}\\vspace{0.5cm}

\\begin{itemize}
\\item There is another form of exogeneity termed \\textit{strict exogeneity} which assumes $$E(u_t\\vert \\dots, X_{t+2},X_{t+1},X_t,X_{t-1},X_{t-2},\\dots)=0,$$ that is the error term has mean zero conditional on past, present and future values of $X$. Strict exogeneity implies exogeneity (as defined in 1. above) but not the other way around. From this point on we will therefore distinguish between exogeneity and strict exogeneity.\\vspace{0.5cm} 
\\item Exogeneity as in 1. suffices for the OLS estimators of the coefficient in distributed lag models to be consistent. However, if the the assumption of strict exogeneity is valid, more efficient estimators can be applied.
\\end{itemize}
\\end{keyconcepts}
')
```

## Dynamic Multipliers and Cumulative Dynamic Multipliers

The following terminology regarding the coefficients in the distributed lag model \@ref(eq:dlm) is useful for upcoming applications:

+ The dynamic causal effect is also called the *dynamic multiplier*. $\beta_{h+1}$ in \@ref(eq:dlm) is the $h$-period dynamic multiplier. 

+ The contemporaneous effect of $X$ on $Y$, $\beta_1$, is termed the *impact effect*. 

+ The $h$-period *cumulative dynamic multiplier* of a unit change in $X$ and $Y$ is defined as the cumulative sum of the dynamic multipliers. In particular, $\beta_1$ is the zero-period cumulative dynamic multiplier, $\beta_1 + \beta_2$ is the one-period cumulative dynamic multiplier and so forth.

     The cumulative dynamic multipliers of the distributed lag model \@ref(eq:dlm) are the coefficients $\delta_1,\delta_2,\dots,\delta_r,\delta_{r+1}$ in the modified regression
\begin{align}
  Y_t =& \, \delta_0 + \delta_1 \Delta X_t + \delta_2 \Delta X_{t-1} + \dots + \delta_r \Delta X_{t-r+1} + \delta_{r+1} X_{t-r} + u_t (\#eq:DCMreg)
\end{align} and thus can be directly estimated using OLS which makes it convenient to compute their HAC standard errors. $\delta_{r+1}$ is called the *long-run cumulative dynamic multiplier*.

It is straightforward to compute the cumulative dynamic multipliers for \@ref(eq:orangemod1), the estimated distributed lag regression of changes in orange juice concentrate prices on freezing degree days, using the corresponding model object `r ttcode("orange_DLM")` and the function `r ttcode("cumsum()")`.

```{r}
# compute cumulative multipliers
cum_mult <-cumsum(orange_DLM$coefficients[-1])

# rename entries
names(cum_mult) <- paste(0:6, sep = "-", "period CDM")

cum_mult
```

Translating the distributed lag model with six lags of $FDD$ to \@ref(eq:DCMreg), we see that the OLS coefficient estimates in this model coincide with the multipliers stored in `r ttcode("cum_mult")`.

```{r}
# estimate cumulative dynamic multipliers using the modified regression
cum_mult_reg <-dynlm(FOJC_pctc ~ d(FDD) + d(L(FDD,1:5)) + L(FDD,6))
coef(cum_mult_reg)[-1]
```

As noted above, using a model specification as in \@ref(eq:DCMreg) allows to easily obtain standard errors for the estimated dynamic cumulative multipliers. 

```{r}
# obtain coefficient summary that reports HAC standard errors
coeftest(cum_mult_reg, vcov. = vcovHAC)
```

## HAC Standard Errors

The error term $u_t$ in the distributed lag model \@ref(eq:dlm) may be serially correlated due to serially correlated determinants of $Y_t$ that are not included as regressors. When these factors are not correlated with the regressors included in the model, serially correlated errors do not violate the assumption of exogeneity such that the OLS estimator remains unbiased and consistent.

However, autocorrelated standard errors render the usual homoskedasticity-only *and* heteroskedasticity-robust standard errors invalid and may cause misleading inference. HAC errors are a remedy.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC15.2">
<h3 class = "right"> Key Concept 15.2 </h3>
<h3 class = "left"> HAC Standard errors </h3>
<p>
**Problem**:

If the error term $u_t$ in the distributed lag model \\@ref(eq:dlm) is serially correlated, statistical inference that rests on usual (heteroskedasticity-robust) standard errors can be strongly misleading.

**Solution**:

Heteroskedasticity- and autocorrelation-consistent (HAC) estimators of the variance-covariance matrix circumvent this issue. There are <tt>R</tt> functions like <tt>vcovHAC()</tt> from the package <tt>sandwich</tt> which are convenient for computation of such estimators.

The package <tt>sandwich</tt> also contains the function <tt>NeweyWest()</tt>, an implementation of the HAC variance-covariance estimator proposed by @newey1987.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[HAC Standard errors]{15.2}
\\textbf{Problem}:\\newline

If the error term $u_t$ in the distributed lag model (\\ref{eq:dlm}) is serially correlated, statistical inference that rests on usual (heteroskedasticity-robust) standard errors can be strongly misleading.\\newline

\\textbf{Solution}:\\newline

Heteroskedasticity- and autocorrelation-consistent (HAC) estimators of the variance-covariance matrix circumvent this issue. There are \\texttt{R} functions like \\texttt{vcovHAC()} from the package \\texttt{sandwich} which are convenient for computation of such estimators.\\newline

The package \\texttt{sandwich} also contains the function \\texttt{NeweyWest()}, an implementation of the HAC variance-covariance estimator proposed by \\cite{newey1987}.
\\end{keyconcepts}
')
```

Consider the distributed lag regression model with no lags and a single regressor $X_t$ 
\begin{align*}
  Y_t = \beta_0 + \beta_1 X_t + u_t.
\end{align*}
with autocorrelated errors. A brief derivation of 
\begin{align}
  \overset{\sim}{\sigma}^2_{\widehat{\beta}_1} = \widehat{\sigma}^2_{\widehat{\beta}_1} \widehat{f}_t (\#eq:nwhac)
\end{align}
the so-called *Newey-West variance estimator* for the variance of the OLS estimator of $\beta_1$ is presented in Chapter 15.4 of the book. $\widehat{\sigma}^2_{\widehat{\beta}_1}$ in \@ref(eq:nwhac) is the heteroskedasticity-robust variance estimate of $\widehat{\beta}_1$ and 
\begin{align}
  \widehat{f}_t = 1 + 2 \sum_{j=1}^{m-1} \left(\frac{m-j}{m}\right) \overset{\sim}{\rho}_j (\#eq:nwhacf)
\end{align}
is a correction factor that adjusts for serially correlated errors and involves estimates of $m-1$ autocorrelation coefficients $\overset{\sim}{\rho}_j$. As it turns out, using the sample autocorrelation as implemented in `r ttcode("acf()")` to estimate the autocorrelation coefficients renders \@ref(eq:nwhac) inconsistent, see pp. 650-651 of the book for a detailed argument. Therefore, we use a somewhat different estimator. For a time series $X$ we have $$ \ \overset{\sim}{\rho}_j = \frac{\sum_{t=j+1}^T \hat v_t \hat v_{t-j}}{\sum_{t=1}^T \hat v_t^2}, \ \text{with} \ \hat v= (X_t-\overline{X}) \hat u_t. $$ We implement this estimator in the function `r ttcode("acf_c()")` below.

$m$ in \@ref(eq:nwhacf) is a truncation parameter to be chosen. A rule of thumb for choosing $m$ is
\begin{align}
  m = \left \lceil{0.75 \cdot T^{1/3}}\right\rceil. (\#eq:hactruncrot)
\end{align}

We simulate a time series that, as stated above, follows a distributed lag model with autocorrelated errors and then show how to compute the Newey-West HAC estimate of $SE(\widehat{\beta}_1)$ using `r ttcode("R")`. This is done via two separate but, as we will see, identical approaches: at first we follow the derivation presented in the book step-by-step and compute the estimate "manually". We then show that the result is exactly the estimate obtained when using the function `r ttcode("NeweyWest()")`. 

```{r}
# function that computes rho tilde
acf_c <- function(x, j) {
  return(
    t(x[-c(1:j)]) %*% na.omit(Lag(x, j)) / t(x) %*% x
  )
}

# simulate time series with serially correlated errors
set.seed(1)

N <- 100

eps <- arima.sim(n = N, model = list(ma = 0.5))
X <- runif(N, 1, 10)
Y <- 0.5 * X + eps

# compute OLS residuals
res <- lm(Y ~ X)$res

# compute v
v <- (X - mean(X)) * res

# compute robust estimate of beta_1 variance
var_beta_hat <- 1/N * (1/(N-2) * sum((X - mean(X))^2 * res^2) ) / 
                        (1/N * sum((X - mean(X))^2))^2

# rule of thumb truncation parameter
m <- floor(0.75 * N^(1/3))

# compute correction factor
f_hat_T <- 1 + 2 * sum(
  (m - 1:(m-1))/m * sapply(1:(m - 1), function(i) acf_c(x = v, j = i))
  ) 

# compute Newey-West HAC estimate of the standard error 
sqrt(var_beta_hat * f_hat_T)
```

For the code to be reusable in other applications, we use `r ttcode("sapply()")` to estimate the $m-1$ autocorrelations $\overset{\sim}{\rho}_j$. 
```{r}
# Using NeweyWest():
NW_VCOV <- NeweyWest(lm(Y ~ X), 
              lag = m - 1, prewhite = F, 
              adjust = T)

# compute standard error
sqrt(diag(NW_VCOV))[2]
```

By choosing `r ttcode("lag = m-1")` we ensure that the maximum order of autocorrelations used is $m-1$ --- just as in equation \@ref(eq:nwhacf). Notice that we set the arguments `r ttcode("prewhite = F")` and `r ttcode("adjust = T")` to ensure that the formula \@ref(eq:nwhac) is used and finite sample adjustments are made.

We find that the computed standard errors coincide. Of course, a variance-covariance matrix estimate as computed by `r ttcode("NeweyWest()")` can be supplied as the argument `r ttcode("vcov")` in `r ttcode("coeftest()")` such that HAC $t$-statistics and $p$-values are provided by the latter.

```{r}
example_mod <- lm(Y ~ X)
coeftest(example_mod, vcov = NW_VCOV)
```

## Estimation of Dynamic Causal Effects with Strictly Exogeneous Regressors

In general, the errors in a distributed lag model are correlated which necessitates usage of HAC standard errors for valid inference. If, however, the assumption of exogeneity (the first assumption stated in Key Concept 15.1) is replaced by strict exogeneity, that is $$E(u_t\vert \dots, X_{t+1}, X_{t}, X_{t-1}, \dots) = 0,$$ more efficient approaches than OLS estimation of the coefficients are available. For a general distributed lag model with $r$ lags and AR($p$) errors, these approaches are summarized in Key Concept 15.4.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC15.4">
<h3 class = "right"> Key Concept 15.4 </h3>
<h3 class = "left"> Estimation of Dynamic Multipliers Under Strict Exogeneity </h3>
<p>
Consider the general distributed lag model with $r$ lags and errors following an AR($p$) process,
\\begin{align}
  Y_t =& \\, \\beta_0 + \\beta_1 X_t + \\beta_2 X_{t-1} + \\dots + \\beta_{r+1} X_{t-r} + u_t (\\#eq:dlmar) \\\\
  u_t =& \\, \\phi_1 u_{t-1} + \\phi u_{t-2} + \\dots + \\phi_p u_{t-p} + \\overset{\\sim}{u}_t. (\\#eq:dlmarerrors)
\\end{align}
Under strict exogeneity of $X_t$, one may rewrite the above model in the ADL specification
\\begin{align*}
  Y_t =& \\, \\alpha_0 + \\phi_1 Y_{t-1} + \\phi_2 Y_{t-2} + \\dots + \\phi_p Y_{t-p} \\\\
      &+ \\, \\delta_0 X_t + \\delta_1 X_{t-1} + \\dots + \\delta_q X_{t-q} + \\overset{\\sim}{u}_t
\\end{align*}
where $q=r+p$ and compute estimates of the dynamic multipliers $\\beta_1, \\beta_2, \\dots, \\beta_{r+1}$ using OLS estimates of $\\phi_1, \\phi_2, \\dots, \\phi_p, \\delta_0, \\delta_1, \\dots, \\delta_q$. 

An alternative is to estimate the dynamic multipliers using feasible GLS, that is to apply the OLS estimator to a quasi-difference specification of \\@ref(eq:dlmar). Under strict exogeneity, the feasible GLS approach is the BLUE estimator for the dynamic multipliers in large samples.

On the one hand, as demonstrated in Chapter 15.5 of the book, OLS estimation of the ADL representation can be beneficial for estimation of the dynamic multipliers in large distributed lag models because it allows for a more parsimonious model that may be a good approximation to the large model. On the other hand, the GLS approach is more efficient than the ADL estimator if the sample size is large.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Estimation of Dynamic Multipliers Under Strict Exogeneity]{15.4}
Consider the general distributed lag model with $r$ lags and errors following an AR($p$) process,
\\begin{align}
  Y_t =& \\, \\beta_0 + \\beta_1 X_t + \\beta_2 X_{t-1} + \\dots + \\beta_{r+1} X_{t-r} + u_t \\\\
  u_t =& \\, \\phi_1 u_{t-1} + \\phi u_{t-2} + \\dots + \\phi_p u_{t-p} + \\overset{\\sim}{u}_t. (\\#eq:dlmarerrors)
\\end{align}
Under strict exogeneity of $X_t$, one may rewrite the above model in the ADL specification
\\begin{align*}
  Y_t =& \\, \\alpha_0 + \\phi_1 Y_{t-1} + \\phi_2 Y_{t-2} + \\dots + \\phi_p Y_{t-p} \\\\
      &+ \\, \\delta_0 X_t + \\delta_1 X_{t-1} + \\dots + \\delta_q X_{t-q} + \\overset{\\sim}{u}_t
\\end{align*}
where $q=r+p$ and compute estimates of the dynamic multipliers $\\beta_1, \\beta_2, \\dots, \\beta_{r+1}$ using OLS estimates of $\\phi_1, \\phi_2, \\dots, \\phi_p, \\delta_0, \\delta_1, \\dots, \\delta_q$.\\newline 

An alternative is to estimate the dynamic multipliers using feasible GLS, that is to apply the OLS estimator to a quasi-difference specification of (\\ref{eq:dlmar}). Under strict exogeneity, the feasible GLS approach is the BLUE estimator for the dynamic multipliers in large samples.\\newline

On the one hand, as demonstrated in Chapter 15.5 of the book, OLS estimation of the ADL representation can be beneficial for estimation of the dynamic multipliers in large distributed lag models because it allows for a more parsimonious model that may be a good approximation to the large model. On the other hand, the GLS approach is more efficient than the ADL estimator if the sample size is large.
\\end{keyconcepts}
')
```


We shortly review how the different representations of a small distributed lag model can be obtained and show how this specification can be estimated by OLS and GLS using `r ttcode("R")`.

The model is
\begin{align}
  Y_t = \beta_0 + \beta_1 X_t + \beta_2 X_{t-1} + u_t, (\#eq:dldynamic)
\end{align}
so a change in $X$ is modeled to effect $Y$ contemporaneously ($\beta_1$) and in the next period ($\beta_2$). The error term $u_t$ is assumed to follow an AR($1$) process,$$u_t = \phi_1 u_{t-1} + \overset{\sim}{u_t},$$ where $\overset{\sim}{u_t}$ is serially uncorrelated.

One can show that the ADL representation of this model is
\begin{align}
  Y_t = \alpha_0 + \phi_1 Y_{t-1} + \delta_0 X_t + \delta_1 X_{t-1} + \delta_2 X_{t-2} + \overset{\sim}{u}_t, (\#eq:adl21dynamic)
\end{align}
with the restrictions
\begin{align*}
  \beta_1 =& \, \delta_0, \\
  \beta_2 =& \, \delta_1 + \phi_1 \delta_0,
\end{align*}
see p. 657 of the book.

#### Quasi-Differences {-}

Another way of writing the ADL($1$,$2$) representation \@ref(eq:adl21dynamic) is the *quasi-difference model*
\begin{align}
  \overset{\sim}{Y}_t = \alpha_0 + \beta_1 \overset{\sim}{X}_t + \beta_2   \overset{\sim}{X}_{t-1} + \overset{\sim}{u}_t, (\#eq:qdm)
\end{align}
where $\overset{\sim}{Y}_t = Y_t - \phi_1 Y_{t-1}$ and $\overset{\sim}{X}_t = X_t - \phi_1 X_{t-1}$. Notice that the error term $\overset{\sim}{u}_t$ is uncorrelated in both models and, as shown in Chapter 15.5 of the book, $$E(u_t\vert X_{t+1}, X_t, X_{t-1}, \dots) = 0,$$ which is implied by the assumption of strict exogeneity.

We continue by simulating a time series of $500$ observations using the model \@ref(eq:dldynamic) with $\beta_1 = 0.1$, $\beta_2 = 0.25$, $\phi = 0.5$ and $\overset{\sim}{u}_t \sim \mathcal{N}(0,1)$ and estimate the different representations, starting with the distributed lag model \@ref(eq:dldynamic).

```{r}
# set seed for reproducibility
set.seed(1)

# simulate a time series with serially correlated errors
obs <- 501
eps <- arima.sim(n = obs-1 , model = list(ar = 0.5))
X <- arima.sim(n = obs, model = list(ar = 0.25))
Y <- 0.1 * X[-1] + 0.25 * X[-obs] + eps
X <- ts(X[-1])

# estimate the distributed lag model
dlm <- dynlm(Y ~ X + L(X))
```

Let us check that the residuals of this model exhibit autocorrelation using `r ttcode("acf()")`.

```{r, fig.align='center'}
# check that the residuals are serially correlated
acf(residuals(dlm))
```

In particular, the pattern reveals that the residuals follow an autoregressive process, as the sample autocorrelation function decays quickly for the first few lags and is probably zero for higher lag orders. In any case, HAC standard errors should be used.

```{r}
# coefficient summary using the Newey-West SE estimates
coeftest(dlm, vcov = NeweyWest, prewhite = F, adjust = T)
```

#### OLS Estimation of the ADL Model {-}

Next, we estimate the ADL($1$,$2$) model \@ref(eq:adl21dynamic) using OLS. The errors are uncorrelated in this representation of the model. This statement is supported by a plot of the sample autocorrelation function of the residual series.

```{r, fig.align='center'}
# estimate the ADL(2,1) representation of the distributed lag model
adl21_dynamic <- dynlm(Y ~ L(Y) + X + L(X, 1:2))

# plot the sample autocorrelaltions of residuals
acf(adl21_dynamic$residuals)
```

The estimated coefficients of `adl21_dynamic$coefficients` are *not* the dynamic multipliers we are interested in, but instead can be computed according to the restrictions in \@ref(eq:adl21dynamic), where the true coefficients are replaced by the OLS estimates.

```{r}
# compute estimated dynamic effects using coefficient restrictions
# in the ADL(2,1) representation
t <- adl21_dynamic$coefficients

c("hat_beta_1" = t[3],
  "hat_beta_2" = t[4] + t[3] * t[2])
```

#### GLS Estimation {-}

Strict exogeneity allows for OLS estimation of the quasi-difference model \@ref(eq:qdm). The idea of applying the OLS estimator to a model where the variables are linearly transformed, such that the model errors are uncorrelated and homoskedastic, is called *generalized least squares* (GLS).

The OLS estimator in \@ref(eq:qdm) is called the *infeasible GLS* estimator because $\overset{\sim}{Y}$ and $\overset{\sim}{X}$ cannot be computed without knowing $\phi_1$, the autoregressive coefficient in the error AR($1$) model, which is generally unknown in practice.

Assume we knew that $\phi = 0.5$. We then may obtain the infeasible GLS estimates of the dynamic multipliers in \@ref(eq:dldynamic) by applying OLS to the transformed data. 

```{r}
# GLS: estimate quasi-differenced specification by OLS
iGLS_dynamic <- dynlm(I(Y- 0.5 * L(Y)) ~ I(X - 0.5 * L(X)) + I(L(X) - 0.5 * L(X, 2)))

summary(iGLS_dynamic)
```

The *feasible GLS* estimator uses preliminary estimation of the coefficients in the presumed error term model, computes the quasi-differenced data and then estimates the model using OLS. This idea was introduced by @cochrane1949 and can be extended by continuing this process iteratively. Such a procedure is implemented in the function `r ttcode("cochrane.orcutt()")` from the package `r ttcode("orcutt")`.

```{r}
X_t <- c(X[-1])
# create first lag
X_l1 <- c(X[-500])
Y_t <- c(Y[-1])

# iterated cochrane-orcutt procedure
summary(cochrane.orcutt(lm(Y_t ~ X_t + X_l1)))
```

Some more sophisticated methods for GLS estimation are provided with the package `r ttcode("nlme")`. The function `r ttcode("gls()")` can be used to fit linear models by maximum likelihood estimation algorithms and allows to specify a correlation structure for the error term. 

```{r}
# feasible GLS maximum likelihood estimation procedure
summary(gls(Y_t ~ X_t + X_l1, correlation = corAR1()))
```

Notice that in this example, the coefficient estimates produced by GLS are somewhat closer to their true values and that the standard errors are the smallest for the GLS estimator.

## Orange Juice Prices and Cold Weather

This section investigates the following two questions using the time series regression methods discussed here:

+ How persistent is the effect of a single freeze on orange juice concentrate prices?

+ Has the effect been stable over the whole time span?

We start by estimating dynamic causal effects with a distributed lag model where $\%ChgOJC_t$ is regressed on $FDD_t$ and 18 lags. A second model specification considers a transformation of the the distributed lag model which allows to estimate the 19 cumulative dynamic multipliers using OLS. The third model, adds 11 binary variables (one for each of the months from February to December) to adjust for a possible omitted variable bias arising from correlation of $FDD_t$ and seasons by adding `r ttcode("season(FDD)")` to the right hand side of the formula of the second model. 

```{r}
# estimate distributed lag models of frozen orange juice price changes
FOJC_mod_DM <- dynlm(FOJC_pctc ~ L(FDD, 0:18))
FOJC_mod_CM1 <- dynlm(FOJC_pctc ~ L(d(FDD), 0:17) + L(FDD, 18))
FOJC_mod_CM2 <- dynlm(FOJC_pctc ~ L(d(FDD), 0:17) + L(FDD, 18) + season(FDD))
```

The above models include a large number of lags with default labels that correspond to the degree of differencing and the lag orders which makes it somewhat cumbersome to read the output. The regressor labels of a model object may be altered by overriding the attribute `r ttcode("names")` of the coefficient section using the function `r ttcode("attr()")`. Thus, for better readability we use the lag orders as regressor labels.

```{r, echo=T}
# set lag orders as regressor labels 
attr(FOJC_mod_DM$coefficients, "names")[1:20] <- c("(Intercept)", as.character(0:18))
attr(FOJC_mod_CM1$coefficients, "names")[1:20] <- c("(Intercept)", as.character(0:18))
attr(FOJC_mod_CM2$coefficients, "names")[1:20] <- c("(Intercept)", as.character(0:18))
```


Next, we compute HAC standard errors standard errors for each model using `r ttcode("NeweyWest()")` and gather the results in a list which is then supplied as the argument `r ttcode("se")` to the function `r ttcode("stargazer()")`, see below. The sample consists of 612 observations:
```{r}
length(FDD)
```
According to \@ref(eq:hactruncrot), the rule of thumb for choosing the HAC standard error truncation parameter $m$, we choose
$$m = \left\lceil0.75 \cdot 612^{1/3} \right\rceil = \lceil6.37\rceil = 7.$$
To check for sensitivity of the standard errors to different choices of the truncation parameter in the model that is used to estimate the cumulative multipliers, we also compute the Newey-West estimator for $m=14$.

```{r}
# gather HAC standard error errors in a list
SEs <- list(sqrt(diag(NeweyWest(FOJC_mod_DM, lag = 7, prewhite = F))), 
            sqrt(diag(NeweyWest(FOJC_mod_CM1, lag = 7, prewhite = F))), 
            sqrt(diag(NeweyWest(FOJC_mod_CM1, lag = 14, prewhite = F))),
            sqrt(diag(NeweyWest(FOJC_mod_CM2, lag = 7, prewhite = F))))
```


The results are then used to reproduce the outcomes presented in Table 15.1 of the book.

```{r, eval=F}
stargazer(FOJC_mod_DM , FOJC_mod_CM1, FOJC_mod_CM1, FOJC_mod_CM2,
  title = "Dynamic Effects of a Freezing Degree Day on the Price of Orange Juice",
  header = FALSE, 
  digits = 3, 
  column.labels = c("Dynamic Multipliers", rep("Dynamic Cumulative Multipliers", 3)),
  dep.var.caption  = "Dependent Variable: Monthly Percentage Change in Orange Juice Price",
  dep.var.labels.include = FALSE,
  covariate.labels = as.character(0:18),
  omit = "season",
  se = SEs,
  no.space = T,
  add.lines = list(c("Monthly indicators?","no", "no", "no", "yes"),
                   c("HAC truncation","7", "7", "14", "7")),
  omit.stat = c("rsq", "f","ser")) 
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(FOJC_mod_DM , FOJC_mod_CM1, FOJC_mod_CM1, FOJC_mod_CM2,
  header = FALSE, 
  type = "html",
  digits = 3, 
  column.labels = c("Dynamic Multipliers", rep("Dynamic Cumulative Multipliers", 3)),
  dep.var.caption  = "Dependent Variable: Monthly Percentage Change in Orange Juice Price",
  dep.var.labels.include = FALSE,
  covariate.labels = as.character(0:18),
  omit = "season",
  se = SEs,
  no.space = T,
  add.lines = list(
                   c("Monthly indicators?","no", "no", "no", "yes"),
                   c("HAC truncation","7", "7", "14", "7")
                   ),
  omit.stat = c("rsq", "f","ser")
  ) 

stargazer_html_title("Dynamic Effects of a Freezing Degree Day on the Price of Orange Juice", "deoafddotpooj")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(FOJC_mod_DM , FOJC_mod_CM1, FOJC_mod_CM1, FOJC_mod_CM2,
  title = "\\label{tab:deoafddotpooj} Dynamic Effects of a Freezing Degree Day on the Price of Orange Juice",
  header = FALSE, 
  type = "latex",
  column.sep.width = "20pt",
  float.env = "sidewaystable",
  single.row = T,
  digits = 3, 
  column.labels = c("Dyn. Mult.", rep("Dyn. Cum. Mult.", 3)),
  dep.var.caption  = "Dependent Variable: Monthly Percentage Change in Orange Juice Price",
  dep.var.labels.include = FALSE,
  covariate.labels = paste("lag",as.character(0:18)),
  omit = "season",
  se = SEs,
  no.space = T,
  add.lines = list(
                   c("Monthly indicators?","no", "no", "no", "yes"),
                   c("HAC truncation","7", "7", "14", "7")
                   ),
  omit.stat = c("rsq", "f","ser")
  ) 
```

According to column (1) of Table \@ref(tab:deoafddotpooj), the contemporaneous effect of a freezing degree day is an increase of $0.5\%$ in orange juice prices. The estimated effect is only $0.17\%$ for the next month and close to zero for subsequent months. In fact, for all lags larger than 1, we cannot reject the null hypotheses that the respective coefficients are zero using individual $t$-tests. The model `r ttcode("FOJC_mod_DM")` only explains little of the variation in the dependent variable ($\bar{R}^2 = 0.11$). 

Columns (2) and (3) present estimates of the dynamic cumulative multipliers of model `r ttcode("FOJC_mod_CM1")`. Apparently, it does not matter whether we choose $m=7$ or $m=14$ when computing HAC standard errors so we stick with $m=7$ and the standard errors reported in column (2). 

If the demand for orange juice is higher in winter, $FDD_t$ would be correlated with the error term since freezes occur rather in winter so we would face omitted variable bias. The third model estimate, `r ttcode("FOJC_mod_CM2")`, accounts for this possible issue by using an additional set of 11 monthly dummies. For brevity, estimates of the dummy coefficients are excluded from the output produced by stargazer (this is achieved by setting `r ttcode("omit = 'season'")`). We may check that the dummy for January was omitted to prevent perfect multicollinearity.

```{r}
# estimates on mothly dummies
FOJC_mod_CM2$coefficients[-c(1:20)]
```

A comparison of the estimates presented in columns (3) and (4) indicates that adding monthly dummies has a negligible effect. Further evidence for this comes from a joint test of the hypothesis that the 11 dummy coefficients are zero. Instead of using `r ttcode("linearHypothesis()")`, we use the function `r ttcode("waldtest()")` and supply two model objects instead: `r ttcode("unres_model")`, the unrestricted model object which is the same as `r ttcode("FOJC_mod_CM2")` (except for the coefficient names since we have modified them above) and `r ttcode("res_model")`, the model where the restriction that all dummy coefficients are zero is imposed. `r ttcode("res_model")` is conveniently obtained using the function `r ttcode("update()")`. It extracts the argument `r ttcode("formula")` of a model object, updates it as specified and then re-fits the model. By setting `r ttcode("formula = . ~ . - season(FDD)")` we impose that the monthly dummies do not enter the model.

```{r}
# test if coefficients on monthly dummies are zero
unres_model <- dynlm(FOJC_pctc ~ L(d(FDD), 0:17) + L(FDD, 18) + season(FDD))

res_model <- update(unres_model, formula = . ~ . - season(FDD))

waldtest(unres_model, 
         res_model, 
         vcov = NeweyWest(unres_model, lag = 7, prewhite = F))
```

The $p$-value is $0.47$ so we cannot reject the hypothesis that the coefficients on the monthly dummies are zero, even at the $10\%$ level. We conclude that the seasonal fluctuations in demand for orange juice do not pose a serious threat to internal validity of the model.

It is convenient to use plots of dynamic multipliers and cumulative dynamic multipliers. The following two code chunks reproduce Figures 15.2 (a) and 15.2 (b) of the book which display point estimates of dynamic and cumulative multipliers along with upper and lower bounds of their $95\%$ confidence intervals computed using the above HAC standard errors.

```{r, fig.align='center', fig.cap="Dynamic Multipliers", label = DM}
# 95% CI bounds
point_estimates <- FOJC_mod_DM$coefficients

CI_bounds <- cbind("lower" = point_estimates - 1.96 * SEs[[1]],
                   "upper" = point_estimates + 1.96 * SEs[[1]])[-1, ]

# plot the estimated dynamic multipliers
plot(0:18, point_estimates[-1], 
     type = "l", 
     lwd = 2, 
     col = "steelblue", 
     ylim = c(-0.4, 1),
     xlab = "Lag",
     ylab = "Dynamic multiplier",
     main = "Dynamic Effect of FDD on Orange Juice Price")

# add a dashed line at 0
abline(h = 0, lty = 2)

# add CI bounds
lines(0:18, CI_bounds[,1], col = "darkred")
lines(0:18, CI_bounds[,2], col = "darkred")
```

The $95\%$ confidence intervals plotted in Figure \@ref(fig:DM) indeed include zero for lags larger than 1 such that the null of a zero multiplier cannot be rejected for these lags.

```{r, fig.align='center', fig.cap="Dynamic Cumulative Multipliers", label = DCM}
# 95% CI bounds
point_estimates <- FOJC_mod_CM1$coefficients

CI_bounds <- cbind("lower" = point_estimates - 1.96 * SEs[[2]],
                   "upper" = point_estimates + 1.96 * SEs[[2]])[-1,]


# plot estimated dynamic multipliers
plot(0:18, point_estimates[-1], 
     type = "l", 
     lwd = 2, 
     col = "steelblue", 
     ylim = c(-0.4, 1.6),
     xlab = "Lag",
     ylab = "Cumulative dynamic multiplier",
     main = "Cumulative Dynamic Effect of FDD on Orange Juice Price")

# add dashed line at 0
abline(h = 0, lty = 2)

# add CI bounds
lines(0:18, CI_bounds[, 1], col = "darkred")
lines(0:18, CI_bounds[, 2], col = "darkred")
```

As can be seen from Figure \@ref(fig:DCM), the estimated dynamic cumulative multipliers grow until the seventh month up to a price increase of about $0.91\%$ and then decrease slightly to the estimated long-run cumulative multiplier of $0.37\%$ which, however, is not significantly different from zero at the $5\%$ level.

Have the dynamic multipliers been stable over time? One way to see this is to estimate these multipliers for different subperiods of the sample span. For example, consider periods 1950 - 1966, 1967 - 1983 and 1984 - 2000. If the multipliers are the same for all three periods the estimates should be close and thus the estimated cumulative multipliers should be similar, too. We investigate this by re-estimating `r ttcode("FOJC_mod_CM1")` for the three different time spans and then plot the estimated cumulative dynamic multipliers for the comparison.

```{r}
# estimate cumulative multiplieres using different sample periods
FOJC_mod_CM1950 <- update(FOJC_mod_CM1, start = c(1950, 1), end = c(1966, 12))

FOJC_mod_CM1967 <- update(FOJC_mod_CM1, start = c(1967, 1), end = c(1983, 12))

FOJC_mod_CM1984 <- update(FOJC_mod_CM1, start = c(1984, 1), end = c(2000, 12))
```


```{r, fig.align='center'}
# plot estimated dynamic cumulative multipliers (1950-1966)
plot(0:18, FOJC_mod_CM1950$coefficients[-1], 
     type = "l", 
     lwd = 2, 
     col = "steelblue",
     xlim = c(0, 20),
     ylim = c(-0.5, 2),
     xlab = "Lag",
     ylab = "Cumulative dynamic multiplier",
     main = "Cumulative Dynamic Effect for Different Sample Periods")

# plot estimated dynamic multipliers (1967-1983)
lines(0:18, FOJC_mod_CM1967$coefficients[-1], lwd = 2)

# plot estimated dynamic multipliers (1984-2000)
lines(0:18, FOJC_mod_CM1984$coefficients[-1], lwd = 2, col = "darkgreen")

# add dashed line at 0
abline(h = 0, lty = 2)

# add annotations
text(18, -0.24, "1984 - 2000")
text(18, 0.6, "1967 - 1983")
text(18, 1.2, "1950 - 1966")
```

Clearly, the cumulative dynamic multipliers have changed considerably over time. The effect of a freeze was stronger and more persistent in the 1950s and 1960s. For the 1970s the magnitude of the effect was lower but still highly persistent. We observe an even lower magnitude for the final third of the sample span (1984 - 2000) where the long-run effect is much less persistent and essentially zero after a year. 

A QLR test for a break in the distributed lag regression of column (1) in Table \@ref{tab:deoafddotpooj} with $15\%$ trimming using a HAC variance-covariance matrix estimate supports the conjecture that the population regression coefficients have changed over time.

```{r, cache=TRUE}
# set up a range of possible break dates
tau <- c(window(time(FDD), 
                time(FDD)[round(612/100*15)], 
                time(FDD)[round(612/100*85)]))

# initialize the vector of F-statistics
Fstats <- numeric(length(tau))

# the restricted model
res_model <- dynlm(FOJC_pctc ~ L(FDD, 0:18))

# estimation, loop over break dates
for(i in 1:length(tau)) {
  
  # set up dummy variable
  D <- time(FOJC_pctc) > tau[i]
  
  # estimate DL model with intercations
  unres_model <- dynlm(FOJC_pctc ~ D * L(FDD, 0:18))
                 
  # compute and save F-statistic
  Fstats[i] <- waldtest(res_model, 
                        unres_model, 
                        vcov = NeweyWest(unres_model, lag = 7, prewhite = F))$F[2]
    
}
```

Note that this code takes a couple of seconds to run since a total of `r ttcode("length(tau)")` regressions with $40$ model coefficients each are estimated.

```{r}
# QLR test statistic
max(Fstats)
```

The QLR statistic is $`r round(max(Fstats),2)`$. From Table 14.5 of the book we see that the $1\%$ critical value for the QLR test with $15\%$ trimming and $q=20$ restrictions is $2.43$. Since this is a right-sided test, the QLR statistic clearly lies in the region of rejection so we can discard the null hypothesis of no break in the population regression function.

See Chapter 15.7 of the book for a discussion of empirical examples where it is questionable whether the assumption of (past and present) exogeneity of the regressors is plausible.

#### Summary {-}

+ We have seen how `r ttcode("R")` can be used to estimate the time path of the effect on $Y$ of a change in $X$ (the dynamic causal effect on $Y$ of a change in $X$) using time series data on both. The corresponding model is called the distributed lag model. Distributed lag models are conveniently estimated using the function `r ttcode("dynlm()")` from the package `r ttcode("dynlm")`.

+ The regression error in distributed lag models is often serially correlated such that standard errors which are robust to heteroskedasticity *and* autocorrelation should be used to obtain valid inference. The package `r ttcode("sandwich")` provides functions for computation of so-called HAC covariance matrix estimators, for example `r ttcode("vcovHAC()")` and `r ttcode("NeweyWest()")`.

+ When $X$ is *strictly exogeneous*, more efficient estimates can be obtained using an ADL model or by GLS estimation. Feasible GLS algorithms can be found in the `r ttcode("R")` packages `r ttcode("orcutt")` and `r ttcode("nlme")`. Chapter 15.7 of the book emphasizes that the assumption of strict exogeneity is often implausible in empirical applications.