# When can variables be safely removed from a causal model?

1 October 2017

Much of our causal talk consists of sentences of the form “c caused e”, where both c and e are token, non-repeatable events or facts or what-have-you (there will be disagreement about what kinds of things ‘c’ and ‘e’ denote, but for now, I’ll just call them ‘events’). Let’s call the kinds of causal relations we’re talking about with sentences like those ‘singular causal relations’. The topic of causation is not exhausted by singular causal relations. There are other interesting causal notions which are clearly distinct from (though they may bear interesting relations to) singular causation. For instance, “Smoking causes cancer” is not a singular causal claim, but rather a general causal claim, relating not token events but rather general types of events.

Many have become convinced that the best way to theorize about singular causation is by understanding it withing the context of some explicitly represented system of causal determination. Causal determination is a third causal notion, distinct from both singular and general causation. Even though it is incorrect to say that the power being on caused the light to not be illuminated, it is nevertheless true that whether the light is illuminated is causally determined by whether the power is on and the switch is up. And, though one could infer that the power is on from the fact that the light is illuminated, it would be incorrect to say that the whether the power is on is causally determined by whether the light is illuminated. To say this would be to get the direction of causal determination the wrong way ‘round.

Those who think that systems of causal determination have an important role to play in a theory of singular causation typically think that these systems of causal determination may be represeted with systems of structural equations. A system of structural equations is a particular kind of model of a network of causal determination. The model consists of a vector of variables (see section 1 of this post for more on how to think about variables), together with a vector of structural equations. For instance, we may introduce a variable $P$ for whether the power is on (at the relevant place and the relevant time). This variable takes on the value $1$ if the power is on, takes on the value $0$ if the power is off, and is undefined otherwise. We may similarly introduce a variable $L$ for whether the light is illuminated—a variable which takes on the value $1$ if the light is illuminated, takes on the value $0$ if the light is not illuminated, and takes on the value $0$ otherwise (if, e.g., the light doesn’t exist). And we may introduce a variable $S$ for whether the switch is up or not (again, $1$ if it’s up, $0$ if it’s down, undefined otherwise). A structural equation then tells us how the value of $L$ is causally determined by the values of $P$ and $S$. In particular, it tells us that
L := P \wedge S \label{1}\tag{1} (Here, “$\wedge$” is just the truth-function ‘and’.) So, $L$ will take on the value $1$ iff both $P$ and $S$ take on the value $1$. If either $P$ or $S$ is $0$, then $L$ will take on the value $0$ as well. When we combine multiple strutctural equations, we can get a system of structural equations. These systems of equations represent networks of causal determination out in the world. For instance, suppose that whether the power is on is structurally determined by whether the light switch is up. If the light switch is up, then the power is on, and if the light switch is down, then the power is off.
$$P := S \tag{2}\label{2}$$ Combining the structural equations \eqref{1} and \eqref{2} gives us a system of structural equations \begin{aligned} L &:= P \wedge S \\\
P &:= S
\end{aligned}

What makes this system of equations structural is that we are interpreting them causally. The equations don’t just say that there is a certain relationship between the values of $L, P,$ and $S$. They additionally says that the value of $L$ is causally determined by the values of $P$ and $S$; and that the value of $P$ is causally determined by the value of $S$. It is for this reason that I use the asymmetric relation “$:=$”, and not the symmetric relation “$=$”. For instance, it follows from the system of equations consisting of \eqref{1} and \eqref{2} that $S=1$ iff $L=1$; so the equation $$S = L$$ will be true if the system of strutural equations $($\eqref{1}, \eqref{2}$)$ is. However, it will be false that $$S := L$$ For, even though the value of $S$ must match the value of $L$, it is not the case that the value of $S$ is causally determined by the value of $L$. It is this additional information which is conveyed by the symbol $:=$.

In a structural equation, there is exactly one, dependent variable on the left-hand-side of the equation, and at least one independent variable on the right-hand-side. I’ll use “$\mathbf{PA}(V)$” to represent a vector of the independent variables on the right-hand-side of $V$'s structural equation. (It is common to refer to these variables as $V$'s causal parents.) Then, a structural equation is of the form $$V := \phi_V(\mathbf{PA}(V))$$ where $\phi_V$ is some function from the values of the variables in $\mathbf{PA}(V)$ to the values of $V$. I will insist, by the way, that $\phi_V$ be surjective if we are to interpret it causally. I will use “$\phi_V$” to represent the entire structural equation $V := \phi_V(\mathbf{PA}(V))$. If a variable appears on the left-hand-side of a structural equation, then that variable is endogenous. Otherwise, it is exogenous.

What I will call a causal model, $\mathbb{M}$, consists of a vector of exogenous variables $\mathbb{U}$, a vector of endogenous variables $\mathbb{V}$, a vector of structural equations $\mathbb{E}$, and a context, $\vec{u}$, which is an assignment of values to the exogenous variables in $\mathbb{U}$.

Causal Model A causal model $\mathbb{M}$ is a 4-tuple $$\mathbb{M} = \langle \mathbb{U}, \mathbb{V}, \mathbb{E}, \vec{u} \rangle$$ of

1. A (non-empty) vector $\mathbb{U}$ of exogenous variables, $( U_1, U_2, \dots, U_M )$.
2. A (non-empty) vector $\mathbb{V}$ of endogenous variables, $( V_1, V_2, \dots, V_N)$.
3. A vector $\mathbb{E}$ of structural equations, $( \phi_1, \phi_2, \dots, \phi_N)$, one for each endogenous variables $V_i \in \mathbb{V}$.
4. A context $\vec{u} = ( u_1, u_2, \dots, u_M )$, which assigns a value to each exogenous variable $U_i \in \mathbb{U}$.

(This is a slightly non-standard presentation. Normally, the context is not taken to be a part of the causal model.)

Given a causal model, we may generate a causal graph by creating a node for every variable and placing an arrow (a directed edge) between two variables $U$ and $V$, with its tail at $U$ and its head at $V$, $U \to V$, iff $U$ appears on the right-hand-side of $V$'s structural equation. For instance, the causal model of the light, the power, and the switch, determines this causal graph:

For a more careful and thorough introduction to causal models, and a theory of when they are correct—that is, when they correctly represent relations of causal determination out in the world—see section 2 of this paper.

# When Removing Exogenous Variables Preserves Correctness

Suppose that we have the causal model introduced above, with the context $S=1$ (the switch is actually up). It appears that we can excise the exogenous variable $S$ from this model entirely. We may simply take $S$'s actual value $1$ and plug it into all the structural equations in which the variable $S$ appeared. When we do this, the structural equation associated with $P$ no longer depends upon any variables, and simply says that $P := 1$. That is: the effect of removing the exogenous variable $S$ has been to render $P$ exogenous. And, when we remove the exogenous $S$, the structural equation associated with $L$ becomes $L := P \wedge 1$, or just $L := P$.

We therefore get the causal model $\mathbb{M}$ with the exogenous variable $S$ excised. This is $$\mathbb{M}_{S} = \langle (P), (L), (L := P), (1) \rangle.$$ That is: $\mathbb{M}_S$ consists of the vector of exogenous variables $\mathbb{U} = (P)$, the vector of endogenous variables $\mathbb{V}=(L)$, the vector of structural equations $\mathbb{E} = (L := P)$, and the exogenous assignment $\vec{u} = (1)$ to $P$.

I think that, if the original model $\mathbb{M}$ was correct, then so too is $\mathbb{M}_S$. This follows from a counterfactual understanding of what makes a causal models correct, since the counterfactuals entailed by the new model $\mathbb{M}_S$ are a proper subset of the counterfactuals entailed by the old model $\mathbb{M}$. Given some plausible assumptions, it also follows from my own preferred way of understanding what makes a causal model correct.

No causal model represents all of the features of reality which could potentially make a difference with respect to the values of the variables in the model. In every causal model, we will be taking for granted certain features of, or causal precursors to, the system being modeled. If I want to model the causal determinants of the forest fire, I needn’t explicitly include a variable for the presense of oxygen. So long as there is plenty of oxygen in the atmosphere, it may be true that whether there is a fire is causally determined by whether the lightning struck. Similarly, so long as the light switch is actually up, whether the light is illuminated is causally determined by whether the power is on or off.

In general, if $U$ is an exogenous variable in the causal model $\mathbb{M}$, we can define the $U$-reduction of $\mathbb{M}$ to be what you get when you remove $U$ from $\mathbb{U}$, put into $\mathbb{U}$ any variables in the model which were causally determined by $U$ alone (and remove those variables from $\mathbb{V}$), replace $U$ for its value in the context $\vec{u}$ within every equation in $\mathbb{E}$ (except, of course, for those endogenous variables $V$ which were causally determined by $U$ alone), and update the context $\vec{u}$ appropriately.

Exogenous $U$-Reduction. Given a causal model $\mathbb{M} = \langle \mathbb{U, V, E}, \vec{u} \rangle$, and some $U \in \mathbb{U}$, the $U$-reduction of $\mathbb{M}$, $\mathbb{M}_U$, is $\langle \mathbb{U}_U, \mathbb{V}_U, \mathbb{E}_U, \vec{u}_U \rangle$, where

1. $\mathbb{U}_U$ is the vector of previously exogenous variables, minus $U$, and plus any endogenous variables whose values were determined by $U$ alone.
2. $\mathbb{V}_U$ is the vector of previously endogenous variables, minus any whose values were determined by $U$ alone.
3. $\mathbb{E}_U$ is a vector of structural equations. For each endogenous variable $V$ in $\mathbb{V}_U$, there is exactly one structural equation, which is the result of taking $V$'s old structural equation in $\mathbb{E}$, and replacing the variable $U$ wherever it appears (if at all) with $U$'s value in the context $\vec{u}$
4. $\vec{u}_U$ is an assignment of values to the variables in $\mathbb{U}_U$ which matches $\vec{u}$ for all exogenous variables previously in $\mathbb{U}$; for those newly exogenous variables, $V$, the assignment in $\vec{u}_U$ is the one determined by taking $V$'s old structural equation in $\mathbb{E}$ and replacing the variable $U$ with $U$'s value in the old context $\vec{u}$.

However, while we can safely remove the exogenous variable $S$ from $\mathbb{M}$ in the context $S=1$, we cannot remove $S$ in the context $S=0$. If we try to do so, we will end up with the structural equation $L := P \wedge 0$. But this equation tells us that $L$'s value does not depend upon $P$'s value. No matter what value $P$ takes on, $L$ will take on the value $0$. So the resulting model would say, falsely, that there $P$ does not causally determine $L$.

The right way to think about this, I believe, is that some $U$-reductions will lead to models which violate necessary conditions on the correctness of causal models. In particular, in order for a structural equation $\phi_V$ to be correct, every value of the right-hand-side variable $V$ must be in the image of $\phi_V$. That is to say: only surjective functions may appear in correct structural equations. And, in order for a causal model to be correct, all of the structural equations it contains must be correct. So, in the context $S=0$, removing the exogenous variable $S$ renders the structural equation $L := P \wedge 0$ non-surjective. Such $U$-reductions are not valid.

Similarly, in order for a causal model to be correct, the vector of endogenous variables $\mathbb{V}$ must be non-empty. Some $U$-reductions will violate this necessary condition on correctness. For instance, consider the $S$-reduced model discussed above. If we try to $U$-reduce this model by excising the exogenous variable $P$, the resulting model, $\mathbb{M}_{S, P}$, will have no endogenous variables. $U$-reductions like these are not valid, either.

In general, we may say that a $U$-reduction is valid iff (1) the resulting endogenous variable set is non-empty, and (2) the resulting structural equations are all surjective.

If a $U$-reduction is valid, then the $U$-reduced model is correct if the original model was. Valid $U$-reduction preserves correctness.

Valid Exogenous Reduction Preserves Correctness. If $\mathbb{M}$ is a correct causal model, and $\mathbb{M}_U$ is a valid exogenous $U$-reduction of $\mathbb{M}$ (i.e., if $\mathbb{M}_U$ is both correct and a $U$-reduction of $\mathbb{M}$), then $\mathbb{M}_U$ is a correct causal model, too.

I have previously laid down conditions for the correctness of causal models. Valid $U$-Reduction Preserves Correctness is not intended as a conjecture about those correctness conditions. I know that my account, as it stands now, violates this principle (the curious may consider the $H$-reduction of the causal model in figure 8 of that paper). Valid $U$-Reduction Preserves Correctness is intended to supplement that account. The principle allows you to move from the correctness of one causal model to the correctness of a certain sub-model, even if the sub-model was not previously deemed correct on its own.

# When Removing Endogenous Variables Preserves Correctness

Go back to our original causal model of the light switch, the power, and the light, \begin{aligned} L &:= P \wedge S \\\
P &:= S
\end{aligned} Just as it appeared that we could excise the exogenous variable $S$ from this model, so too does it appear that we may excise the variable $P$ from this model. Since we know that the power turns on whenever the light switch is on; and since we know that, if both the power and the switch are on, the light will be illuminated, it appears that we may conclude straightaway that, if the switch is on, then the light will be illuminated. Moreover, the switch’s being on appears to causally determine the light’s being illuminated. So it seems that, if the original causal model was correct, then so too should be the model $$\mathbb{M}_P = \langle (S), (L), (L := S), (1) \rangle$$ This is the model containing the sole exogenous variable $S$, the sole endogenous variable $L$, the sole structural equation $L := S$, and the exogenous assignment $1$ to $S$. Call this model the endogenous $P$-reduction of $\mathbb{M}$. We got $\mathbb{M}_P$ from $\mathbb{M}$ by simply replacing the variable $P$ in $L$'s structural equation with the right-hand-side of $P$'s structural equation, giving $L := S \wedge S$. And this function is equivalent to $L := S$.

What’s more, it appears as though we can carry out this endogenous reduction of $\mathbb{M}$ whatever the value of $S$ happens to be. Even if $S = 0$, it will still be the case that $L$'s value will be causally determined to match $S$'s value.

In general, if $V$ is an endogenous variable in the causal model $\mathbb{M}$, we can define the $V$-reduction of $\mathbb{M}$ to be what you get when you remove $V$ from $\mathbb{V}$, and replace $V$, every time it appears on the right-hand-side of a structural equation, with the right-hand-side of $V$'s own structural equation.

Endogenous $V$-Reduction. Given a causal model $\mathbb{M} = \langle \mathbb{U, V, E}, \vec{u} \rangle$, and some $V \in \mathbb{V}$, the $V$-reduction of $\mathbb{M}$, $\mathbb{M}_V$, is $\langle \mathbb{U}, \mathbb{V}_V, \mathbb{E}_V, \vec{u} \rangle$, where

1. $\mathbb{V}_V$ is the original vector of endogenous variable $\mathbb{V}$, minus the variable $V$.
2. $\mathbb{E}_V$ is just like the original vector of structural equations, except that it is lacking $V$'s structural equation $V := \phi_V( \mathbf{PA}(V) )$, and every occurrence of $V$ on the right-hand-side of the remaining equations is replaced with $\phi_V( \mathbf{PA}(V) )$.

While we can safely remove the endogenous variable $P$ in our model of the light and the switch, we may not always do this. While some endogenous $V$-reductions are valid, other are not. For instance, consider the Lewisian neuron diagram shown below.

The neuron diagram displays a case of what’s known in the literature as early preemption. Neuron $A$'s firing would have caused $E$ to fire, but it was preempted by neuron $C$'s firing. As things actually shook out, it was $C$, and not $A$, that caused $E$ to fire. I’ll suppose that this neuron diagram may be represented with a causal model containing a binary variable for every neuron, where those variables take the value $1$ if the neuron fires at its designated time, and takes the value $0$ if the neuron does not fire at its designated time. Then, we will end up with the following system of structural equations.

\begin{aligned} E &:= B \vee D \\\ B &:= A \wedge \neg C \\\ D &:= C \end{aligned}

The endogenous $D$-reduction of this causal model is

\begin{aligned} E &:= B \vee C \\\ B &:= A \wedge \neg C \\\ \end{aligned}

And the endogenous $B$-reduction of this causal model is

$$E := (A \wedge \neg C) \vee C$$

Or, equivalently,

$$E := A \vee C$$

But this model treats $A$ and $C$ symmetrically. And both $A$ and $C$ take on the value $1$. This means that any theory of singular causation which looks only at the patterns of counterfactual dependence in a causal model (including, perhaps, information about which variable values are default and which are deviant) will, when applied to this model, say that $A=1$ caused $E=1$ iff $C=1$ caused $E=1$. But this would be a disasterous result—for $A=1$ did not cause $E=1$; while $C=1$ did cause $E=1$.

Lesson: if we want to use correct causal models to uncover relations of singular causation, then we had better not think that endogenous reduction always preserves correctness.

A similar lesson follows when we look at cases of preemptive prevention like the one shown below.

Here, $B$'s firing prevents $E$ from firing. However, had $B$ not fired, $A$ would have prevented $E$ from firing. So, $B$'s firing preempted $A$'s prevention. We can represent this neuron diagram with the following system of equations (where the variables are given the natural interpretation, and take on the value $1$ if the assocaited neuron fires, and take on the value $0$ if the associated neuron does not fire).

\begin{aligned} E &:= C \wedge \neg (B \vee D ) \\\ D &:= A \wedge \neg B \end{aligned}

The endogenous $D$-reduction of this causal model gives the sole structural equation $$E := C \wedge \neg (B \vee (A \wedge \neg B))$$ Or, equivalently, $$E := C \wedge \neg A \wedge \neg B$$ However, this reduced model treats $A$ and $B$ symmetrically, and both $A$ and $B$ take on the value $1$; any theory which looks only at patterns of counterfactual dependence in correct causal models will therefore say that $A=1$ prevented $E=1$ iff $B=1$ prevented $E=1$. But $B=1$ prevented $E=1$ while $A=1$ did not. So, again, if we want to use correct causal models to uncover relations of singular causation, then we had better not think that endogenous reduction always preserves correctness.

I’d like to suggest that precisely the same thing goes wrong in both of the foregoing cases of endogenous variable reduction. In the first case—the case of preemption—the endogenous $B$-reduction took us to a model in which $E$'s value is determined directly by both $A$ and $C$. In the associated causal graph of the $B$-reduced model, there is one arrow leading from $A$ to $E$, and another arrow leading from $C$ to $E$. The model presents these causal pathways as autonomous, with both $A$ and $C$ determining $E$'s value in a way that is independent of the other’s influence. However, $A$'s determination of $E$'s value is not autonomous of $C$'s. In fact, both $A$ and $C$ determines $E$'s by way of a common variable, $B$.

Similarly, in the case of preemptive prevention, endogenous $D$-reduction brought us to a model in which $E$'s value is determined directly and autonomously by both $A$ and $B$. But the way that $A$ determines $E$'s value is not autonomous of the way that $B$ determines $E$'s value. In fact, both $A$ and $B$ determine $E$'s value by way of a common variable, $D$.

In the original causal models, variables like $B$ (in Preemption) and $D$ (in Preemptive Prevenvtion) are called colliders. What makes a variable in a causal model a collider is that there are two distinct arrows leading into that variable. Equivalently, a variable is a collider iff it has more than one causal parent. (Note: “collider” is usually defined to be a path-relative notion; as I’m using the notion here, a variable is a collider iff it is a collider along some path or other.)

Reflection on cases like the foregoing leads to the following constraint on valid endogenous $V$-reduction: if the endogenous variable $V$ is a collider, then $V$-reduction is not valid. Colliders may not be removed in the manner specified in Endogenous $V$-Reduction. Now, I believe that this is the only constraint on valid endogenous reduction. So long as $V$ is not a collider, $V$ may be excised from the causal graph in the manner specified in Endogenous $V$-Reduction.

Moreover, I believe this to be the only constraint on valid endogenous reduction. So we may say that, in general, a $V$-reduction is valid iff $V$ is a non-collider.

If a $V$-reduction is valid, then theh $V$-reduced model is correct if the original model was. Valid $V$-reduction preserves correctness.

Valid Endogenous Reduction Preserves Correctness. If $\mathbb{M}$ is a correct causal model, and $\mathbb{M}_V$ is a valid $V$-reduction of $\mathbb{M}$ (i.e., if $V$ is not a collider in $\mathbb{M}$), then $\mathbb{M}_V$ is a correct causal model, too.

In the case of exogenous $U$-reduction, the corresponding principle (Valid Exogenous Reduction Preserves Correctness) carried with it a genuine extension of the conditions for correctness of causal models which I endorsed previsouly. In the case of endogenous $V$-reduction, reflection on cases like preemption and preemptive prevention call for a corresponding constriction in those conditions. There are causal models, like the $D$-reduced model of figure 3, which my previous account deems correct but which are not correct. Valid Endogenous Reduction Preserves Correctness does not yet rule out models like those. To do so, we should additionally endorse:

Invalid Endogenous Reduction Destroys Correctness. If there is some correct causal model $\mathbb{M} = \langle \mathbb{U, V, E}, \vec{u} \rangle$, with a collider $V \in \mathbb{V}$, then the $V$-reduction of $\mathbb{M}$, $\mathbb{M}_V$, is not correct.