Instrumental variables#
This section is a work in progress and is subject to change. Special thanks to Yan Shuo Tan, who wrote most of this section’s content.
Review and introduction#
To briefly recap what we have learnt so far:
We defined a superpopulation model, i.e. a distribution for
: is the (binary) treatment decision, and are the potential outcomes in the universes where the unit wasn’t/was treated, is a confounding variable (in other words, it has a causal effect on and on ) So far, we haven’t needed to make any assumptions about the distribution of these variables in general (only that it exists).
We defined our quantity of interest, the average treatment effect (ATE):
, which tells us the average effect of the treatment. We saw that this is impossible to estimate unless we make further assumptions.We saw that in a randomized experiment, we have the following:
The treatment decisions are random, and therefore are independent of the potential outcomes.
In other words,
.
In this section, we’ll investigate how we can estimate the ATE in situations where we have unknown confounding variables. We’ll rely on natural experiments to help us. Note that you’ve probably seen natural experiments before in Data 8, when learning about John Snow’s study of cholera.
Linear structural model (LSM)#
In some fields (such as economics), it is typical to work with structural models, which place some restrictions on the joint distribution of all the variables, and in doing so, make it easier to estimate the parameters of the model.
We will work with the linear structural model relating our outcome
where
Note: in general, we often add the further structural equation
This is not quite the same as the linear model that we have seen when we learned about GLMs, and that you’ve seen in previous classes! While it looks very similar, the linear model we worked with before is a statement about associations and predictions, while this linear structural model is a statement about intervention and action.
Specifically, this model assumes that if for unit
From this, we see that the average treatment effect in this model is
In other words, the linear structural model is making an implicit assumption that the treatment effect is constant across all units.
Causal graphs and LSMs#
Apart from the causal effect of

As a reminder, the arrows from
Confounding and omitted variable bias#
In many scenarios, confounding is complicated and involves many different variables, and it may be impossible to collect, observe, or describe all of them. In that case we must assume that
Treatment |
Outcome |
Possible confounder(s) |
---|---|---|
Health insurance |
Health outcomes |
Socioeconomic background |
Military service |
Salary |
Socioeconomic background |
Family size |
Whether the mother is in the labor force |
Socioeconomic background |
Years of schooling |
Salary |
Socioeconomic background |
Smoking |
Lung cancer |
Socioeconomic background |
Note that in most of these examples, socioeconomic background is a confounder. This is particularly common in economics and econometrics, where most of the methods in this section originated.
Let’s be a bit more precise about quantifying the effect of confounding. Specifically, we’ll assume the linear structural model above, and then see what happens when we naively try to fit a linear regression to
Let
The second term is a bias in the
Remark:
Why can’t we just adjust for confounding? Having such confounders is problematic because in order to avoid omitted variable bias, we need to have observed them, and added them to our regression (collection of such data may not always be feasible for a number of reasons.) Furthermore, there could always be other confounders that we are unaware of, which leaves our causal conclusions under an inescapable cloud of doubt.
Instrumental Variables#
Is there a middle way between a randomized experiment and assuming unconfoundedness, which is sometimes unrealistic?
One way forward is when nature provides us with a “partial” natural experiment, i.e. we have a truly randomized “instrument” that injects an element of partial randomization into the treatment variable of interest. This is the idea of instrumental variables. We will first define the concept mathematically, and then illustrate what it means for a few examples.
Definition: Assume the linear structural model defined above. We further assume a variable
Remark: This replaces the earlier equation from before that
Let us now see how to use
Where the second equality follows from the exogeneity of
Putting everything together gives
$
In other words,
This motivates the instrumental variable estimator of the ATE in finite samples:
where again, abusing notation,
Further interpretation for binary
Causal graph for instrumental variables#
The relationships between

How to read this graph:
The arrow from
into shows that has a causal effect onThe absence of any arrow into
means that is exogeneous, i.e. no variable in the diagram causes , and in particular is independent of .The absence of an arrow from
into means that the only effect of on is through .We shaded in
, and because these nodes are observed, but is unshaded because it is latent (unobserved).
Note that we do not need to know or even be aware of what
Examples of instrumental variables#
Let’s examine what we might use as instrumental variables for the five examples from the table in the previous section. The first four are taken from the econometrics literature:
Example 1:
Example 2:
Example 3:
Example 4:
Example 5:
As we see in these examples, sometimes you need to be quite ingenious to come up with an appropriate instrumental variable. Joshua Angrist, David Card, and Guido Imbens, who is named in several of these examples, are phenomenally good at this: in fact, they won the Nobel Prize in economics for their collected body of work!
Extensions#
Multiple treatments / instruments, and two-stage least squares.#
So far, we have considered scalar treatment and instrumental variables
First define the conditional expectation
If we regress
Here, the 2nd equality holds because
In finite samples, we thus arrive at the following algorithm:
Two-stage least squares algorithm (2SLS):
Step 1: Regress
on to get .Step 2: Regress
on to get .
For the scalar setting, it is easy to see that
(Optional) A non-parametric perspective on instrumental variables#
In this notebook, we have introduced instrumental variables in the context of structural linear models. What if our model is nonlinear?
In an amazing coincidence, for binary treatment
has a meaning beyond the linear model setting. This is the subject of this groundbreaking paper by Angrist and Imbens in 1996.