The Role of Interventions in Causal Decision Theory

Causal Bayes Nets provide a nice formal representation of the world’s causal and probabilistic structure, and so it is natural to want formulate causal decision theory (CDT) in terms of causal Bayes nets. Lots of good work has been done on this front—see, in particular, Meek and Glymour (1994), Pearl (2000, chapter 4), Hitchcock (2016), and Stern (2017). Central to these formulations of CDT is the distinction between a probability function which has been conditioned on an act A’s performance and a probability function which has been updated on an intervention bringing about A’s performance. However, in the work of Meek and Glymour, there is some confusion about how a causal decision theorist ought to understand these intervention probabilities. We could either understand intervention probabilities as the result of conditioning your probability function on the performance of acts which are genuinely available to you, or as a formal tool for ignoring whatever information your act stands to give you about factors outside of your control. Meek and Glymour take the former understanding of interventions, and this leads to confusion about the content of causal decision theory and its relationship to Newcomb’s Problem. The point of the post today is to explain why I think the Meek and Glymour understanding of intervention probabilities gets CDT wrong.

In part, this is a terminological dispute about how to use the label ‘causal decision theory’. But getting clear about this terminology is important, since, as we’ll see, using Meek & Glymour’s terminology can obscure what’s really at issue in the debate between EDT and CDT—and what’s really at issue in the debate between one-boxers and two-boxers in Newcomb’s Problem.

1. Newcomb’s Problem

We can specify a decision problem by the available acts, $A_1, A_2, \dots, A_N$, the potential states of the world, $S_1, S_2, \dots, S_M$, the value you attach to performing each act $A_i$ in each state $S_j$, $V(A_i S_j)$, and a probability function defined over these act-state conjunctions. I’ll assume that the states are specified finely enough so that knowing which act you performed in which state settles the outcome of everything you value. In such a decision problem, EDT says that you should choose whichever act maximizes expected value, $V(A)$, where $$ V(A) := \sum_S \Pr(S \mid A) \cdot V(SA) $$ The canonical counterexample to EDT is Newcomb’s problem.


Newcomb’s Problem. You are on a game show playing for charity. Before you are two boxes, box #1 and box #2. Box #2 is transparent, and you may see that it contains 10,000 dollars. Box #1 is opaque; it either contains 1,000,000 dollars or nothing. Normally, contestants on this game show have to choose whether to settle for the guaranteed 10,000 dollars or to take a chance on getting 1,000,000 dollars with box #1. However, since you are playing for charity, you’ve been given the opportunity to just take both boxes—all the money in front of you—for your chosen charity. Incidentally, if it was predicted that you would take only box #1, then 1,000,000 dollars was placed there. If, however, it was predicted that you would take both boxes, then nothing was left inside box #1. These predictions aren’t particularly reliable. Given that you take both boxes, there’s a 51% probability that it was predicted that you would take both boxes. Given that you take only box #1, there’s a 51% probability that it was predicted that you would take only box #1.


In Newcomb’s Problem, EDT recommends leaving money behind. For, if we use ‘$O$’ for the proposition that you only open door #1, ‘$B$’ for the proposition that you open both doors, and ‘$M$’ for the proposition that there is a million dollars behind door #1, then \begin{aligned} V(O) &= \Pr(M \mid O) \cdot 1,000,000 \,\,+\,\, \Pr(\neg M \mid O) \cdot 0 \\
&= 510,000 \end{aligned} (I’ve assumed that your values are linear in dollars.) Whereas, \begin{aligned} V(B) &= \Pr(M \mid B) \cdot 1,010,000 \,\,+\,\, \Pr(\neg M \mid B) \cdot 10,000 \\
&= 500,000 \end{aligned}
Since the expected value of taking one box exceeds the expected value of taking both boxes, EDT recommends leaving behind 10,000 dollars which you could have easily given to charity.

Leaving money behind gives you good news about the world. It gives you good evidence that there is 1,000,000 dollars awaiting in box #1. But, no matter what prediction was made, it does less to improve the world than taking both boxes. If there is 1,000,000 dollars in box #1, then the world is a good one, and taking both boxes improves the world more than just taking one. If there is nothing in box #1, then the world is not a good one. But, even so, taking both boxes improves the world more than just taking one.

So-called ‘two-boxers’ see Newcomb’s Problem as a counterexample to EDT. But many—so called ‘one-boxers’—have remained unconvinced. They see nothing irrational in acting so as to give yourself good news about the world, even when it does less to improve the world than the alternatives.

2. Causal Decision Theory

Many two-boxers have thought that EDT errs in conflating two kinds of news that your act is in a position to give you—on the one hand, your act could give you good news about factors outside of your control; on the other hand, your act could give you good news about the downstream causal consequences of your act. If we take each state $S$ and split it into those factors which are causally downstream of your act, $C$, and those factors which are not causally downstream of your act, $K$, then EDT says that you should choose an act $A$ which maximizes: \begin{align} V(A) &= \sum_S \Pr(S \mid A) \cdot V(SA) \\
&= \sum_K \sum_C \Pr(KC \mid A) \cdot V(KCA) \\
&= \sum_K \Pr(K \mid A) \cdot \sum_C \Pr(C \mid KA) \cdot V(KCA) \end{align} In the difference it makes to the term $\Pr(K \mid A)$, conditionalizing on $A$ provides information about states which are outside of your control. In the difference it makes to ther term $\Pr(C \mid KA)$, on the other hand, conditionalizing on $A$ provides information about the good which performing $A$ stands to causally promote. While this second sort of information is relevant to your decision, the first sort of information is not.

Two-boxers who favor this diagnosis of where EDT has gone wrong suggest factoring out the first kind of information by replacing $\Pr(K \mid A)$ in the above with $\Pr(K)$. Call the resulting quantity the utility of an act $A$, $$ U(A) = \sum_K \Pr(K) \cdot \sum_C \Pr(C \mid KA) \cdot V(KCA) $$ The causal decision theorist thinks that acts with higher utility are to be preferred to acts with lower utility. (This is Skyrms (1982)’s formulation of CDT; there are alternatives, but the differences aren’t important for my purposes here. See Joyce (1999) for further discussion.)

Note that, in using the quantity $U(A)$ to evaluate acts, the causal decision theories does not deny that your act can be correlated with factors outside of your control. That is, they do not deny that it is possible for $\Pr(K \mid A)$ to be different than $\Pr(K)$. If they thought this, then there would be no difference between the value of an act $V(A)$ and the utility of the act $U(A)$. This is a very important point, and so I want to really emphasize it.


Very Important Point. There is a difference between the verdicts of EDT and CDT only if, for some act $A$ and some factor $K$, outside of your control, $$ \Pr(K \mid A) \neq \Pr(K) $$ That is: there is a difference between EDT and CDT only if your act is correlated with some state of the world which is not causally downstream of that act.


If it were impossible for an act to be correlated with factors outside of your control—that is, if $\Pr(K \mid A)$ were always identical to $\Pr(K)$, we would think that cases like Newcomb’s Problem are impossible. And then, the causal decision theorist would have no objection to EDT.

3. Causal Decision Theory with Interventions

Notice that we could define a causal probability function over states as follows: $$ \Pr(S \mid\mid A) := \Pr(K) \cdot \Pr(C \mid KA) $$ (where $K$ are the factors in state $S$ which are not causally downstream of your act and $C$ are the factors in state $S$ which are causally downstream of your act). Then, the $U$-value of an act $A$ will be $$ U(A) = \sum_S \Pr(S \mid\mid A) \cdot V(SA) $$ Interventionist decision theories endorse a decision theory with just this form, though they use the formal tools of causal Bayes nets to define their causal probability function, $\Pr(S \mid\mid A)$.

3.1 Causal Bayes Nets

A causal Bayes net is a pair of a directed acyclic graph (DAG) and a probability function which we require to satisfy a constraint known as the Markov Condition, relative to that graph.

A directed acyclic graph (DAG) is a pair $<\mathbb{V}, \mathbb{E}>$ of a set of variables $\mathbb{V}$ and a set of directed edges $\mathbb{E}$ between those variables—though not just any set of directed edges will do. In order for this to be a directed acyclic graph, we require there are no loops or cycles in the directed edges. If we write ‘$U \to V$’ to indicate that there is a directed edge between $U$ and $V$, then the acyclicity requirement is that there is no sequence of variables $V_1, V_2, \dots V_N \in \mathbb{V}$ such that \begin{align} V_1 \to V_2 \to \dots \to V_N \to V_1 \end{align}

Here is a DAG representing the causal structure of Newcomb’s Problem:

Figure 1: A Directed Acyclic Graph of Newcomb’s Problem.

There is a common cause, $K_1$, of both your action, $A$, and the prediction which has been made about your action, $K_2$. Your action and the prediction, $K_2$, will causally determine the amount of money which goes to charity, $C$. The variable $A$ can take on two values, $1$ and $2$. If $A=1$, then you take one box and leave money behind. If $A=2$, then you take both boxes, and don’t. Let’s similarly say that $K_1$ and $K_2$ can take on the values $1$ and $2$, and the variable $C$ can take on the values $1,010,000$, $1,000,000$, $10,000$, and $0$, depending upon how much money goes to charity.

One bit of terminology we’ll need: given a DAG, let’s say that the parents of a variable $V$, $\mathbf{PA}(V)$, are those variables which have a directed edge leading from them to $V$. Thus, in the DAG above, the parents of $C$, $\mathbf{PA}(\,C\,)$, are $A$ and $K_2$. Both $K_2$ and $A$ have a single parent, $K_1$. And $K_1$ doesn’t have any parents.

As I’ll use the term here, a causal Bayes net is a pair $< \mathbb{V}, \mathbb{E}, \Pr >$ of a DAG, and a probability function $\Pr$ defined over the values of the variables $V \in \mathbb{V}$, such that the probability function $\Pr$ satisfies the Markov Condition with respect to that DAG.


Markov Condition. A probability function $\Pr$ over the variables in $\mathbb{V} = \{ V_1, V_2, \dots, V_N \}$ satisfies the Markov Condition, relative to a given DAG $<\mathbb{V}, \mathbb{E}>$ if and only if $$ \Pr(V_1, V_2, \dots, V_N) = \prod_i \Pr(V_i \mid \mathbf{PA}(V_i)) $$


For instance: if you want to know the probability that $K_1$ takes on the value $k_1$, $K_2$ takes on the value $k_2$, $A$ takes on the value $a$, and $C$ takes on the value $c$, $$ \Pr(K_1 = k_1, K_2 = k_2, A=a, C=c) $$ all you need to calculate this are the conditional probabilities $\Pr(C=c \mid K_2=k_2, A=a)$, $\Pr(K_2=k_2 \mid K_1=k_1)$, and $\Pr(A=a \mid K_1=k_1)$. Multiply them together, and you get $\Pr(K_1=k_1, K_2=k_2,A=a,C=c)$. In this way, the Markov Condition allows you to construct the entire joint probability function over $K_1, K_2, A,$ and $C$ from just knowledge of the conditional probabilities $\Pr(V \mid \mathbf{PA}(V))$. In Newcomb’s Problem, let’s say that those conditional probabilities are as follows: \begin{align} \Pr(K_1 = 1) &= 0.5 &\Pr(K_1 = 2)&= 0.5 \\
\Pr(A=1 \mid K_1 = 1) &= 0.51 &\qquad \Pr(K_2 = 1 \mid K_1 = 1) &= 1 \\
\Pr(A=2 \mid K_1 = 2) &= 0.51 &\qquad \Pr(K_2 = 2 \mid K_1 = 2) &= 1 \\
\Pr(C=1,000,000 \mid A=1, K_2 = 1) &= 1 &\Pr(C=0 \mid A=1, K_2 = 2) &= 1 \\
\Pr(C=1,010,000 \mid A=2, K_2 = 1) &= 1 &\Pr(C=10,000 \mid A=2, K_2 = 2) &= 1 \end{align}

There is another, equivalent, formulation of the Markov Condition which will be useful later on, so let me mention it briefly here:


Markov Condition (v2). A probability function $\Pr$ over the variables in $\mathbb{V} = \{ V_1, V_2, \dots, V_N \}$, satisfies the Markov Condition, relative to a given DAG $<\mathbb{V}, \mathbb{E}>$ if and only if, for any variable $V \in \mathbb{V}$, and any variable ${K}$ which is not causally downstream of $V$, $V$ and ${K}$ are probabilistically independent, conditional on $\mathbf{PA}(V)$ $$ \Pr(K \mid V, \mathbf{PA}(V)) = \Pr(K \mid \mathbf{PA}(V)) $$


Given the joint probability distribution, conditional probabilities can be calculated in the usual way via the ratio formula—$\Pr(X \mid Y) := \Pr(XY)/\Pr(Y)$ (so long as $\Pr(Y) \neq 0$). Conditionalization on $A$ will allow $A$ to provide information about $K_1$, which, in turn, will provide information about $K_2$. And it is exactly this kind of ‘backtracking’ information which the causal decision theorist wishes to exclude from their evaluation of an act. So, if we’re causal decision theorists, we won’t want to evaluate acts with a probability function conditioned on the act. With a DAG in hand, we are in a position to define up a causal probability function which filters out this kind of ‘backtracking’ information.


Causal Probability. Given a causal Bayes net $<\mathbb{V}, \mathbb{E}, \Pr>$, define the causal probability for $A=a$ as $$ \Pr(V_1, V_2, \dots, V_N \mid\mid A=a) := \Pr(A \mid\mid A=a) \cdot \prod _{V \neq A} \Pr(V \mid \mathbf{PA}(V)) $$ and we stipulate that $\Pr(A=a \mid\mid A=a) = 1$.


That is: to get the causal probability for an act $A=a$, you take the factorization of $\Pr$ provided by the Markov Condition, and you simply replace the term for $\Pr(A \mid \mathbf{PA}(A))$ with a new term, $\Pr(A \mid\mid A=a)$, where we stipulate that the new term $\Pr(A \mid\mid A=a)$ takes the value $1$ if $A=a$ and otherwise takes the value $0$.

This causal probability will not obey the Markov Condition relative to our original DAG. However, it will obey the Markov Condition relative to an updated DAG in which you’ve removed the directed edges going into the variable $A$—call this the post-intervention DAG. For instance, given the original DAG from Newcomb’s Problem, the post-intervention DAG will be the one where we have removed the arrow leading from $K_1$ to $A$.

Figure 2: A post-intervention DAG for Newcomb’s Problem.

An intervention is a way of setting the value of a variable like $A$ which severs any causal influence between $A$ and its parents in the DAG. If you had intervened upon the variable $A$ to set its value to $a$, then the DAG shown in figure 2 would be the correct representation of the post-intervention causal structure. Moreover, relative to that post-intervention causal structure, the probability function $\Pr(-\mid\mid A=a)$ would satisfy the Markov Condition.

To think about these kinds of interventions more carefully, let’s include in our DAG of Newcomb’s Problem an explicit variable for whether the intervention on $A$ has taken place.

Figure 2: A DAG for Newcomb’s Problem which includes an intervention variable, $I_A$.

The new variable, $I_A$, can take on the values $0, 1,$ and $2$. If $I_A = 0$, then $A$ will take on the same value as $K_1$. If, however, $I_A = 1$, then $A$ will take on the value $1$, no matter what value $K_1$ takes on. And similarly, if $I_A = 2$, then $A$ will take on the value $2$, no matter what value $K_1$ takes on.

That is: the probability function over this new DAG has the same conditional probabilities for $K_1, K_2,$ and $C$ as the previous DAG, but the conditional probabilities for $A$ are now this: \begin{align} \Pr(A=1 \mid I_A = 0, K_1 = 1) &= 1 &\qquad \Pr(A=2 \mid I_A = 0, K_1 = 2) &= 1 \\
\Pr(A=1 \mid I_A = 1, K_1 = 1) &= 1 &\qquad \Pr(A=1 \mid I_A = 1, K_1 = 2) &= 1 \\
\Pr(A=2 \mid I_A = 2, K_1 = 1) &= 1 &\qquad \Pr(A=1 \mid I_A = 2, K_1 = 2) &= 1 \end{align}

Then, the probability function from our original DAG will just be this new probability function conditionalized upon $I_A=0$. And the probability function from the post-intervention DAG, $\Pr(- \mid\mid A=a)$, will just be this new probability function conditionalized upon $I_A = a$.

4. Interventions as Acts

Meek and Glymour use the tools of causal Bayes nets to formulate a version of causal decision theory which says that you should choose an act with maximal $U$-value, where $$ U(A) = \sum_S \Pr(S \mid\mid A) \cdot V(SA) $$ states are understood as assignments of values to the variables in $\mathbb{V}$, and the causal probabilities $\Pr(S \mid\mid A)$ are understood as the post-intervention probabilities defined in the previous section.

This is a fine theory for a causal decision theorist to accept. I don’t have any objection to Meek and Glymour attributing this theory to causal decision theorists; but I do have an objection to Meek and Glymour’s interpretation of this theory. For, on Meek and Glymour’s interpretation, the theory becomes a version of evidential decision theory.

First, Meek and Glymour object to the possibility of cases like Newcomb’s Problem, on the grounds that the case, as originally described, is not a true decision problem.

We find something odd in questions about what an agent, no matter whether oneself or another, ought to do when one knows the agent’s action, whatever it is, will be necessitated by circumstances that cannot be influenced by any response to the question. Both advise and deliberation then seem pointless, and their benefits illusory. (p. 1007)

Two comments on these remarks: firstly, in order to construct cases like Newcomb’s Problem, it needn’t be the case that your action is necessitated by past causal factors. For instance, in our original causal Bayes net of the case from the previous section, whether you take one box or two was not necessitated by the factor $K_1$ (since $\Pr(A=a \mid K_1 = a)$ was only 51%, not 100%), though $K_1$ did still causally influence your act. Secondly, and more importantly, in these remarks, Meek and Glymour are rejecting compatibilism about free will, saying that an agent cannot freely deliberate if their deliberation is itself caused. Perhaps this view is correct (well, it’s not, but let’s not dwell upon that right now), but if we accept this view, and if we accept what the Markov Condition has to say about the relationship between the world’s causal structure and the probability function $\Pr$, then we shouldn’t think that our acts can ever be probabilistically correlated with states of the world which are outside of our control. (For, to be clear, given the Markov Condition, the only way for an act to be correlated with a factor which isn’t causally downstream of it is for the act and the factor to have a common cause, which requires that the act be caused.)

That’s all to say: if you think cases like Newcomb’s Problem are genuine decision problems (decision problems in which you are free to undertake any act), then you had better think that our acts can be both free and caused. Which is to say: you had better be a compatibilist about free will.

Meek and Glymour continue:

Even so, two possible reasons to deliberate or advise suggest themselves…You may wish to know…what someone otherwise like you but free to choose among alternative actions would rationally choose to do. Alternatively, one may view decisions…as the result of the action of a dual system with a default part and an extraordinary part—the default part subject to causes that may also influence the outcome through another mechanism, but the extraordinary part not so influenced and having the power to intervene and displace or modify the production of the default part. For brevity we will describe the extraordinary part as the Will.

That is: Meek and Glymour suppose that the causal structure of Newcomb’s Problem is as shown in figure 2, where the intervention variable $I_A$ is the ‘extraordinary’ part—the Will—which is capable of overriding the ‘default’ behavior caused by $K_1$.

At this point, the case Meek and Glymour are considering is emphatically not the original Newcomb’s Problem (it is rather what Joyce (2018) calls a pseudo-Newcomb problem, as it fails to meet Joyce’s conditions $NP_1$ and $NP_4$, which any genuine Newcomb problem must meet). But let’s put that point to the side. Meek and Glymour are not considering Newcomb’s Problem, but this is an intentional choice, as they do not think that the case as usually understood constitutes a genuine decision problem. This is a philosophical disagreement between causal decision theorists and Meek and Glymour about whether free deliberation is compatible with the knowledge that one’s acts are caused—though it’s not a philosophical disagreement that I want to dwell upon here.

What I want to dwell upon here is instead the view that Meek and Glymour attribute to causal decision theorists when they write that:

The causal decision theorist conditions on the event of an intervention, [while the] ‘evidential’ decision theorist conditions on an event…that is not an intervention. The difference in the two recommendations does not turn on any difference in normative principles, but rather on a substantive disagreement about the causal principles at work in the context of decision-making—the causal decision theorist thinks that when someone decides [how to act], an intervention occurs, and the ‘evidential’ decision theorist thinks otherwise. (p. 1009)

This is not an accurate characterization of the causal decision theorist’s position. The causal decision theorist does not think that, whenever you decide how to act, an intervention occurs.

One thing that is surely true of causal decision theorists is that they think the original Newcomb’s Problem is a counterexample to EDT. This case is the motivation for their theory. If it was already handled by EDT, there would be no need for CDT. But recall the Very Important Point from section 2: there is a difference between EDT and CDT only if there is a correlation between some factor outside of your control, $K$, and your act—that is, there is a difference between EDT and CDT only if, for some $K$, $$ \Pr(K \mid A) \neq \Pr(K) $$ But suppose that, whenever you choose an act, $A$, an intervention occurs. Since an intervention has occurred, $A$ does not have any causal parents (recall the post-intervention DAG from figure 2). So the Markov Condition (v2) tells us that your act must be probabilistically independent of all factors which are not causally downstream of it—and in particular, your act must be probabilistically independent of the prediction in Newcomb’s Problem. Thus, for any factor $K$ which is not causally downstream of your act—and, in particular, the prediction—it must be that $$ \Pr(K \mid A) = \Pr(K) $$ But then, EDT and CDT must give the same advice, and Newcomb’s Problem could not present a counterexample to EDT, unless it also presented a counterexample to CDT.

Meek and Glymour’s distinction between the default system and the Will suggests that perhaps they are thinking that a correlation exists between your choice and the prediction because of the default system, though, once you intervene with an act of the Will, such correlations will go away. In that case, Meek and Glymour may be interpreting the causal decision theorist as saying that you ought to intervene, and not allow your default system to choose for you. (Thoughts along these lines show up in Hitchcock (2016).)

This is also not an accurate characterization of the causal decision theorist’s position. The causal decision theorist attaches absolutely no value to interventions as such. Suppose that your default system is on a course to take both boxes. According to the causal decision theorist, what is the utility of letting the predictable default system run its course, $D$? It is (using the same notation from section 1) \begin{align} U(D) &= \Pr(M) \cdot V(MD) \,\,+\,\, \Pr(\neg M) \cdot V(\neg M D) \\
&= 1,010,000 \cdot \Pr(M) \,\,+\,\, 10,000 \cdot \Pr(\neg M) \end{align} And what, according to the causal decision theorist, is the utility of intervening so as to bring it about that you take both boxes in a way which could not be predicted, $I$? It is also \begin{align} U(I) &= \Pr(M) \cdot V(MI) \,\,+\,\, \Pr(\neg M) \cdot V(\neg M I) \\
&= 1,010,000 \cdot \Pr(M) \,\,+\,\, 10,000 \cdot \Pr(\neg M) \end{align} $\Pr(M \mid I)$ is higher than $\Pr(M \mid D)$—but, in evaluating acts, the causal decision theorist cares not at all about the probabilities of states of nature conditional on acts, $\Pr(K \mid A)$. Evaluating acts in this way leads to the irrational policy of ‘managing the news’ which the causal decision theorist seeks to avoid. They avoid irrationally managing the news by utilizing the unconditional probabilities of states of nature, $\Pr(K)$, when evaluating acts. The causal decision theorist is therefore indifferent between selecting two boxes in a predictable way and selecting two boxes in an unpredictable way. Being predictable does nothing to change the amount of money in front of you; it does nothing to improve the world; and therefore has nothing to speak in its favor, according to the causal decision theorist.

Evidential decision theorists, on the other hand, may attach quite a lot of value to interventions. For EDT, interventions are often far more chioceworthy than the default system. If your default system is on a course to take two boxes, so that $\Pr(M \mid D) = 0.49$, and intervening so as to take two boxes is less predictable, $\Pr(M \mid I) = 0.5$, then $V(D)$ will be $500,000$ as before, but \begin{align} V(I) &= \Pr(M \mid I) \cdot V(MI) \,\,+\,\, \Pr(\neg M \mid I) \cdot V(\neg MI) \\
&= 0.5 \cdot 1,010,000 \,\,+\,\, 0.5 \cdot 10,000 \\
&= 510,000 \end{align} So the evidential decision theorist will advise you to intervene, so as to be less predictable.

Suppose that intervening with an act of the Will is to some extent dispreferred—you’d rather not think about what you’re doing, so intervening costs some small amount of value, $\epsilon$. Then, the view which Meek & Glymour call “causal decision theory” will advise you to intervene rather than let the default system choose two boxes, even though this act is causally dominated by allowing the default system to choose for you. For there are two possible states of nature: either there is a million dollars in box #1, $M$, or there is not, $\neg M$. If there is a million dollars in box #1, then allowing the default system to choose two boxes, $D$, has a higher value than intervening so as to take two boxes, $I$, \begin{align} V(M D) &= 1,010,000 &\qquad V(M I) &= 1,010,000 - \epsilon \end{align} And, if there is not a million dollars in box #1, then allowing the default system to choose two boxes has a higher value than intervening so as to take two boxes, \begin{align} V(\neg M D) &= 10,000 &\qquad V(\neg M I) &= 10,000 - \epsilon \end{align} Causal decision theorists accept a principle of causal dominance which says that causally dominated acts are irrational. So they do not recommend intervening in this case. So the view Meek & Glymour are calling ‘causal decision theory’ is not causal decision theory. (It is, instead, evidential decision theory.)

In sum: the position which Meek & Glymour attribute to causal decision theorists is a form of libertarianism about free will—the view that acts constitute interventions. Far from being central to CDT, this is actually a thesis which, if true, would render CDT unnecessary, since, if this thesis were true, CDT and EDT would never disagree. Nor does the causal decision theorist think that we ought to intervene, if we can. Intervention is worthless to the causal decision theorist. The view which evaluates acts differently, depending upon whether they are interventions or not, is evidential decision theory.

5. Interventions as Measures of Efficacy

As I said before, to some extent this is a terminological dispute about what is properly called “causal decision theory”. (It is equally, of course, an interpretive question about how to understand authors like Gibbard & Harper, Skyrms, Lewis, and Joyce—and, here, the barest modicum of charity militates against the Meek & Glymour interpretation.) But, more importantly, using the label “causal decision theory” for a version of evidential decision theory which utilizes causal Bayes nets obscures what is at issue in the debate between EDT and CDT, and what is at issue between one-boxers and two-boxers.

There is, of course, a very interesting philosophical debate about whether we have libertarian free will—whether our acts constitute interventions—and whether it makes sense to deliberate about what to do while knowing that your decision is caused by factors outside of your control. But this is not the debate between one-boxers and two-boxers. Both one-boxers and two-boxers agree that Newcomb’s Problem is a genuine decision problem. What they disagree about is how to act in that decision problem. But if deliberation is impossible or pointless when you believe your acts are caused, or if all acts are interventions, then Newcomb’s Problem would not be a genuine decision problem.

Both one-boxers and two-boxers accept that there may be correlations between your choice and states over which you exercise no control, and that you may freely deliberate in such circumstances. So both one-boxers and two-boxers accept that not all acts are interventions, and that deliberation does not lose its point when this is so. What is at issue between them is whether correlations between your act and good states outside of your control speak in favor of that act. The one-boxer says “yes”, while the two-boxer says “no”. The two-boxer thinks that you should choose acts which do the most possible to improve the world in which you find yourself, whatever bad news those acts may carry with them.

So the two-boxer wishes to evaluate acts by looking at the good they stand to promote, and not considering the good those acts stand to merely indicate. And it is here that the causal probabilities supplied by interventions will be of interest to the causal decision theorist. They may want to use the causal probabilities $\Pr(S \mid\mid A)$ as a measure, not of the probability of $S$, given that $A$ is performed, but rather as a measure of $A$’s ability to bring $S$ about—that is, as a measure of $A$’s efficacy in promoting the state $S$. What the causal decision theorist says is that you should evaluate an act $A$ by using causal probabilities, whether the act $A$ is an intervention or not.

Let’s distinguish acts which are the results of interventions by the Will (unpredictable acts) from those which are not (predictable acts). If $A$ is a predictable act, let ‘$I_A$’ be an intervention which has the same causal consequences as $A$, but which is (because an intervention) unpredictable. The evidential decision theorist will treat $A$ and $I_A$ differently.


Evidential Decision Theory. The choiceworthiness of $A$ is given by $$ V(A) = \sum_S \Pr(S \mid A) \cdot V(SA) $$ Whereas the choiceworthiness of $I_A$ is given by \begin{align} V(I_A) &= \sum_S \Pr(S \mid I_A) \cdot V(S I_A) \\
&= \sum_S \Pr(S \mid\mid A) \cdot V(SA) \end{align}


(In the above, think of $S$ as an assignment of values to the variables in the DAG. Also note that I assume you don’t take the intervention $I_A$ to be intrinsically valuable, so that $V(S I_A) = V(S A)$.) On the other hand, since the causal decision theorist evaluates all acts using the causal probabilities $\Pr(S \mid\mid A)$ as measures of causal strength, CDT treats $A$ and $I_A$ in precisely the same way.


Causal Decision Theory. The choiceworthiness of $A$ is given by $$ U(A) = \sum_S \Pr(S \mid\mid A) \cdot V(SA) $$ And the choiceworthiness of $I_A$ is given by \begin{align} U(I_A) &= \sum_S \Pr(S \mid\mid I_A) \cdot V(S I_A) \\
&= \sum_S \Pr(S \mid\mid A) \cdot V(SA) \end{align}


In sum, Meek and Glymour present causal decision theory as though it treats $A$ and $I_A$ differently. But it emphatically does not. The view they are discussing is evidential decision theory. It is a version of EDT which makes use of tools from causal modeling—in particular, the notion of an intervention and the Markov Condition—to specify the available acts and the relevant probability function. But, once the available acts are specified and probability function given, the view they are calling “causal decision theory” gives precisely the same advice which EDT would give with that choice of acts and that probability function.

This is to some extent a question about which view is most deserving of the name “causal decision theory” (and, on this score, I would have hoped it obvious that the view introduced and defended by authors like Gibbard & Harper, Skyrms, Lewis, and Joyce under the name “causal decision theory” is more deserving of the name than the view those authors were attempting to argue against), but there is also a deeper issue with using terminology in this way. Because the view called “causal decision theory” by Meek & Glymour evaluates acts by the goods they indicate, this terminological choice serves to obscure the central lesson which causal decision theorists draw from Newcomb’s Problem—namely, that acts should be evaluated by the good they promote, and not the good they merely indicate.