2019, Dec 12

Chances of a death foretold

In Gibbard and Harper’s ‘Death in Damascus’, you must choose to travel to either Damascus or Aleppo, you are rather confident that you will meet Death in whichever city you actually choose, and that traveling to the city you don’t actually choose would save your life. In the standard version of this case, that’s because Death has made a quite reliable prediction about which city you will choose. Today’s post isn’t about ‘Death in Damascus’. It’s about a superficially similar case in which Death does not predict which city you will choose. Instead, Death simply flips a coin to decide where to go. But before you make up your mind, a reliable oracle tells you that you’ll meet Death. What’s interesting about this version case is that, for orthodox CDT, which choice is permissible depends upon when the coin flip takes place.

As I’ll be understanding it here, causal decision theory is formulated with the aid of an imaging function, which maps a world $w$ and a proposition $A$ to a probability function, $w_A$, such that $w_A(A) = 1$. The interpretation of this imaging function is that, if $A$ is an act, then $w_A(x)$ is the chance that world $x$ would obtain, were you to choose $A$ at world $w$. Then, as I’ll understand it here, causal decision theory (CDT) says to select the act, $A$, which maximizes $\mathcal{U}(A)$, where

$$ \mathcal{U}(A) \stackrel{\text{df}}{=} \sum_w \Pr(w) \cdot \sum_x w_A(x) \cdot V(x) $$

and $V(x)$ is the degree to which you desire that world $x$ is actual. The inner sum $\sum_x w_A(x) \cdot V(x)$ is how good you would expect $A$ to make things, were you to choose it at world $w$. $\mathcal{U}(A)$ is your expectation of this quantity, so it measures how good you would expect $A$ to make things, were you to choose it. CDT says to choose the act which you would expect to make things best, were you to choose it.

If there aren’t any chances to speak of, then $w_A$ will put all of its probability on a single world, which we can write just ‘$w_A$’. $w_A$ is the world which would have obtained, had you performed $A$ in $w$. If there are no chances to speak of, then $\mathcal{U}(A) = \sum_w \Pr(w) \cdot V(w_A)$.

CDT disagrees with its rivals only when there is a correlation between your choice and a state which is causally independent of your choice (a ‘state of nature’). This can happen in two different ways. Firstly, there could be a common cause, $CC$, of your choice, $A$, and the state of nature, $K$.

In this case, so long as the value of the common cause $CC$ is not known, there may be a correlation between $K$ and $A$ (though, if the value of $CC$ is known, then $A$ and $K$ will be probabilistically independent.)

For instance, consider:


Death Predicted Based on knowledge of your brain chemistry, Death made a prediction about whether you would go to Aleppo or Damascus. He awaits in whichever city he predicted. Given that you go to Aleppo, you are 80% confident that Death will await there. And given that you go to Damascus, you are 60% confident that Death will await there.


Your brain chemistry is the common cause of Death’s prediction and your choice. It explains the correlation between you and Death’s choice of city.

In this case, the recommendations of CDT depend upon how confident you are that you’ll end up going to Aleppo. I’ll suppose that avoiding Death is the only thing you care about, and that $V($Death$) = 0$, while $V($Life$) = 1$. Let ‘$A$’ be the proposition that you go to Aleppo, and let ‘$D$’ be the proposition that you go to Damascus. Let $a$ be your probability that you’ll go to Aleppo. Then,

$$ \mathcal{U}(A) = 0.6 - 0.4 a \qquad \text{ and } \qquad \mathcal{U}(D) = 0.4 + 0.4 a $$

If $a > 0.25$, then $\mathcal{U}(D) > \mathcal{U}(A)$. If $a < 0.25$, then $\mathcal{U}(D) < \mathcal{U}(A)$. And if $a = 0.25$, then $\mathcal{U}(D) = \mathcal{U}(A)$. So, if you are likely to go to Aleppo, then CDT recommends that you go to Damascus. If you begin to take this advice to heart, and learn that you have, so that you end up likely to go to Damascus, then CDT changes its mind, and advises you to go to Aleppo. If you follow this advice, and learn that you have, then CDT will change course again, advising you to go to Damascus. And so on.

Deliberational Causal Decision Theorists like Brian Skyrms, James Joyce, and Brad Armendt advise you to vacillate back and forth in this way until you end up exactly 25% likely to choose Aleppo and 75% likely to choose Damascus. At that point, both options have equal utility, and so both options are permissible. Skyrms advises you to perform a mixed act of choosing Aleppo with 25% probability and Damascus with 75% probability, whereas Joyce and Armendt say simply that you are permitted to pick either destination, but none will conclude that you’ve chosen irrationally from the fact that you end up in Aleppo. (My official position is that this is a mistake. Given that you’re more likely to face Death in Aleppo than Damascus, Aleppo is an irrational choice.)

Unknown common causes aren’t the only way of introducing a correlation between your choice, $A$, and a state of nature, $K$. There could be a correlation because there is a common effect of $A$ and $K$, $CE$, whose value is known.

In this case, there could be a correlation between $A$ and $K$, even when they are causally independent, and they have no common causes.

For instance, consider:


Death Foretold Earlier today, Death flipped an indeterministic coin to decide whether to go to Aleppo or Damascus. If it landed heads twice, then he decided to go to Damascus. Otherwise, he decided to go to Aleppo. Now you must choose where to go. Before you make your choice, an oracle informs you that you will meet Death tomorrow.


Whether you meet Death is a common effect of your choice and the coin flip. And the oracle’s prophesy allows you to know the value of this common effect. So in Death Foretold, as in Death Predicted, there is a correlation between your choice and Death’s destination.

There are four relevant possibilities:

Suppose that the oracle’s prophesies are perfectly reliable—you’re certain that she speaks the truth. In that case, the correlation is perfect, and you give positive probability to only the possibilities $w_A^A$ and $w^D_D$. And your probability that $w_A^A$ is actual is just your probability that you choose Aleppo, $a$.

Since the coin has already been flipped, there are no chances to speak of. At $w_A^A$, if you were to choose to go to Damascus, you’d be at the world $w_D^A$ (if you were to go to Aleppo, you’d be at $w_A^A$, since you in fact choose Aleppo at $w_A^A$). And at $w_D^D$, if you were to choose to go to Aleppo, you’d be at the world $w_A^D$ (if you were to go to Damascus, you’d be at $w_D^D$, since you in fact choose Damascus at $w_D^D$).

Again let $a$ be your probability that you’ll go to Aleppo. Then,

$$ \mathcal{U}(A) = 1-a \qquad \text{ and } \qquad \mathcal{U}(D) = a $$

So, in Death Foretold, CDT leads to exactly the same kind of instability as in Death Predicted. So long as $a > 0.5$, $\mathcal{U}(D) > \mathcal{U}(A)$. If $a < 0.5$, then $\mathcal{U}(D) < \mathcal{U}(A)$. And if $a = 0.5$, then $\mathcal{U}(D) = \mathcal{U}(A)$.

As in Death Predicted, deliberational causal decision theorists will say that either destination is permissible. My own judgment is that this is the correct verdict. But I’m not interested in the defending this judgment. Instead, I want to call attention to the fact that CDT’s verdicts are different if Death flips his coin a bit later in the day.


Death Foretold (v2) Later today, Death will flip an indeterministic coin to decide whether to go to Aleppo or Damascus. If it lands heads twice, then he will decide to go to Damascus. Otherwise, he will decide to go to Aleppo. Now you must choose where to go. Before you make your choice, an oracle informs you that you will meet Death tomorrow.


In this case, there are chances to speak of. At $w_A^A$, if you were to choose to go to Damascus, there’s a 25% chance that you’d be at the world $w_D^D$, and there’s a 75% chance that you’d be at the world $w_D^A$ (since there’s a 25% chance that Death’s coin lands heads twice). Similarly, at $w_A^A$, if you were to choose to go to Aleppo, there’s a 25% chance that you’d be at the world $w_A^D$, and there’s a 75% chance that you’d be at the world $w_A^A$. At $w_D^D$, if you were to choose to go to Damascus, there’s a 25% chance that you’d be at the world $w_D^D$ and a 75% chance that you’d be at $w_D^A$. And, at $w_D^D$, if you were to go to Aleppo, there’s a 25% chance you’d be at $w_A^D$ and a 75% chance you’d be at $w_A^A$.

This makes a difference to the values of $\mathcal{U}(A)$ and $\mathcal{U}(D)$. Now that the coin flip has moved later in the day,

\begin{aligned} \mathcal{U}(A) &= \sum_w \Pr(w) \cdot \sum_x w_A(x) \cdot V(x) \\\
&= \Pr(w_A^A) \cdot [ 0.25 \cdot V(w_A^D) + 0.75 \cdot V(w_A^A) ] + \Pr(w_D^D) \cdot [0.25 \cdot V(w_A^D) + 0.75 \cdot V(w_A^A)] \\\
&= a \cdot [ 0.25 \cdot 1 + 0.75 \cdot 0 ] + (1-a) \cdot [0.25 \cdot 1 + 0.75 \cdot 0] \\\
&= 0.25 \end{aligned}

and

\begin{aligned} \mathcal{U}(D) &= \sum_w \Pr(w) \cdot \sum_x w_D(x) \cdot V(x) \\\
&= \Pr(w_A^A) \cdot [ 0.25 \cdot V(w_D^D) + 0.75 \cdot V(w_D^A) ] + \Pr(w_D^D) \cdot [0.25 \cdot V(w_D^D) + 0.75 \cdot V(w_D^A)] \\\
&= \Pr(w_A^A) \cdot [ 0.25 \cdot 0 + 0.75 \cdot 1 ] + \Pr(w_D^D) \cdot [0.25 \cdot 0 + 0.75 \cdot 1] \\\
&= 0.75 \end{aligned}

So now, $\mathcal{U}(D)$ is greater than $\mathcal{U}(A)$, no matter how likely you are to go to Aleppo or Damascus. So now, deliberational causal decision theorists will say that it is impermissible to go to Aleppo. This seems like the wrong verdict to me, but what seems worse is that deliberational CDT treats Death Foretold differently from Death Foretold (v2). Death’s coin flip is causally independent of your choice; whether the coin is flipped in the morning or the evening shouldn’t make a difference with respect to whether it is permissible to go to Aleppo.

Causal decision theorists could try to treat these two cases similarly by using a Rabinowicz-ian (strongly) centered imaging function. This imaging function is like the one I used above, except that, in Death Foretold (v2), it says that, at $w_A^A$, were you to go to Aleppo, there’s a 100% chance that you’d end up at world $w_A^A$; and, at $w_D^D$, were you to go to Damascus, there’s a 100% chance that you’d end up at world $w_D^D$.

Rabinowicz’s theory helps somewhat, but not enough. It still treats Death Foretold (v2) differently than Death Foretold. In the second version of the case, where Death’s coin flip takes place later in the day, it says that

\begin{aligned} \mathcal{U}(A) = 0.25 (1-a) \qquad \text{ and } \qquad \mathcal{U}(D) = 0.75 a \end{aligned}

Whereas, when Death’s coin flip takes place earlier in the day, it says that $\mathcal{U}(A) = 1-a$ and $\mathcal{U}(D) = a$ (since, in that case, there are no chances to speak of, so Rabinowicz and the orthodox view will agree). Suppose that your initial probability for $A$ is 0.4. Then, in Death Foretold, Rabinowicz’s theory says (at least, at the beginning of deliberation) that you must go to Aleppo. However, in Death Foretold (V2), with the same initial probability for $A$, Rabinowicz’s theory says (at least, at first) that you must go to Damascus.


2019, Jul 9

CDT violates the IIA

As I’ll use the name here, the independence of irrelevant alternatives (IIA) says that adding an additional option to the menu can’t transform an impermissible choice into a permissible one. An old story from Sidney Morgenbesser illustrates the seeming irrationality of violating this principle: asked to decide between steak and chicken, a man says “I’d rather have the steak”. The waiter tells him that they also have fish, to which he responds: “Oh, in that case, I’ll have the chicken”. This behavior looks irrational, and a principle like IIA explains why. I recently realized that causal decision theory (CDT) doesn’t abide by the IIA. Not just orthodox CDT, but basically any theory worth calling ‘causalist’ will end up violating the principle. The point of today’s post is to explain why.

1. Stage Setting

Orthodox CDT measures the choiceworthiness of acts with their utility, $\mathcal{U}$, where $$ \mathcal{U}(A) \stackrel{\text{df}}{=} \sum_K \Pr(K) \cdot \mathcal{D}(KA) $$ (In the above, the $K$s are a partition of states of nature, and $\mathcal{D}(KA)$ says how strongly you desire that you perform $A$ in the state of nature $K$.) Orthodox CDT says that an act is permissible iff it maximizes utility.

Let me use ‘$\mathbf{M}$’ for a menu of options, and I’ll use “$\mathcal{P}(\mathbf{M})$” for the permissible options on the menu $\mathbf{M}$. Then, orthodox CDT says that an act is permissible iff its utility is no less than any alternative act, $$ \mathcal{P}(\mathbf{M}) = \{ A \in \mathbf{M} \mid (\forall B \in \mathbf{M}) \mathcal{U}(A) \geqslant \mathcal{U}(B) \} $$

One noteworthy feature of the measure $\mathcal{U}$ is that its values can depend upon how likely you think you are to select each act. Let’s write “$\mathcal{U}_A(B)$” for the utility you would assign to the act $B$, were you to learn only that you had performed the act $A$: $$ \mathcal{U}_A(B) \stackrel{\text{df}}{=} \sum_K \Pr(K \mid A) \cdot \mathcal{D}(KA) $$ Then, in a choice between two options, $A$ and $B$, both of the following situations are possible:

Self-Undermining Choice Once chosen, every act is worse than the alternative. \begin{aligned} \mathcal{U}_A(B) > \mathcal{U}_A(A) \qquad \text{ and } \qquad \mathcal{U}_B(A) > \mathcal{U}_B(B) \end{aligned} Self-Reinforcing Choice Once chosen, every act is better than the alternative. \begin{aligned} \mathcal{U}_A(A) > \mathcal{U}_A(B) \qquad\text{ and } \qquad \mathcal{U}_B(B) > \mathcal{U}_B(A) \end{aligned}

This can lead CDT’s verdicts to change as you make up your mind about what to. In a self-undermining choice, once you follow CDT’s advice and intend to do the act it called rational, it will change its mind and begin to call you irrational. In a self-reinforcing choice, if you disregard its advice and do what it said was irrational, CDT will change its mind and call you rational for doing so.

I’ve come to think that this is a reason to doubt orthodox CDT. But this feature won’t be relevant to anything I’m saying today. All I will need to appeal to here is the following, minimal committment of CDT:

Minimal CDT. In a choice between two options, $A$ and $B$, if the utility of $A$ exceeds the utility of $B$, and it would continue to do so whether you choose $A$ or $B$, then $B$ is impermissible. $$ \left( \mathcal{U}_A(A) > \mathcal{U}_A(B) \text{ and } \mathcal{U}_B(A) > \mathcal{U}_B(B) \right) \Rightarrow B \notin \mathcal{P}(\{ A, B \}) $$

For instance, in Newcomb’s Problem, whether you one box or two box, the utility of two-boxing will exceed the utility of one-boxing. So Minimal CDT says that one-boxing is impermissible.

2. CDT violates IIA

What I’ll show here is that the following three principles are jointly inconsistent.


Independence of Irrelevant Alternatives (IIA)

Given any two menus of options where the first is a subset of the second, and $A$ appears on both, if it is not permissible to choose $A$ from the smaller menu, then it is not permissible to choose $A$ from the larger menu. $$ A \in \mathbf{M} \subseteq \mathbf{M}^+ \Rightarrow \left[ A \notin \mathcal{P}(\mathbf{M}) \Rightarrow A \notin \mathcal{P}(\mathbf{M}^+) \right] $$

Minimal CDT

In a choice between two options, $A$ and $B$, if the utility of $A$ exceeds the utility of $B$, and it would continue to do so whether you choose $A$ or $B$, then $B$ is impermissible. $$ \left( \mathcal{U}_A(A) > \mathcal{U}_A(B) \text{ and } \mathcal{U}_B(A) > \mathcal{U}_B(B) \right) \Rightarrow B \notin \mathcal{P}({ A, B }) $$

No Dilemmas

Given any menu of options, some option is permissible to choose from that menu. $$ \forall \mathbf{M} \quad \mathcal{P}(\mathbf{M}) \neq \varnothing $$


Since CDT is clearly committed to Minimal CDT and No Dilemmas, it must reject the IIA.

To see why these three principles are jointly inconsistent, consider the following decision:

You must choose between three boxes, labeled ‘$A$’, ‘$B$’, and ‘$C$’. You can take one, and only one, of the boxes. Yesterday, a reliable predictor made a prediction about how you would choose. Their predictions are 80% reliable—so, conditional on you taking box $X$, you’re 80% sure that this is what they predicted you’d do. If they predicted that you would take $A$, then they left nothing in $A$, 10 dollars in $C$, and bill for 10 dollars in $B$. If they predicted that you would take $B$, then they left nothing in $B$, 10 dollars in $A$, and a bill for 10 dollars in $C$. If they predicted that you would take $C$, then they left nothing in $C$, 10 dollars in $B$, and a bill for 10 dollars in $A$.

If we use “$K_X$” for the state of nature in which it was predicted that you would take box $X$, then the desirabilities and probabilities for this decision are shown in the matrices below.

$$ \begin{array}{r | c c c} \mathcal{D}(\text{Row Col}) & K_A & K_B & K_C \\\hline A & 0 & 10 & -10 \\\
B & -10 & 0 & 10 \\\
C & 10 & -10 & 0
\end{array} \qquad \begin{array}{r | c c c} \Pr(\text{ Col} \mid \text{ Row}) & A & B & C \\\hline K_A & 0.8 & 0.1 & 0.1 \\\
K_B & 0.1 & 0.8 & 0.1 \\\
K_C & 0.1 & 0.1 & 0.8
\end{array}
$$

Multiplying the left-hand-side matrix by the right-hand-side matrix gives us the following matrix of the utility you would assign to the row act, were you to learn only that you had performed the column act:

\begin{array}{r | c c c} \mathcal{U}_{\text{Col}}(\text{Row}) & A & B & C \\\hline A & 0 & 7 & -7 \\\
B & -7 & 0 & 7 \\\
C & 7 & -7 & 0
\end{array}

Now, suppose that, instead of being given a choice between $A$, $B$, and $C$, you were instead offered a choice between just $A$ and $B$—$C$ is taken off of the menu (though there’s still a 10% chance that the predictor falsely predicted that you would take $C$). In that case, notice that both $\mathcal{U}_A(A) > \mathcal{U}_A(B)$ and $\mathcal{U}_B(A) > \mathcal{U}_B(B)$. So Minimal CDT says that, in a decision between $A$ and $B$, $B$ is an impermissible choice, $B \notin \mathcal{P}(\{ A, B \})$. Also notice that a choice between $B$ and $C$ is exactly like a choice between $A$ and $B$. As is a choice between $C$ and $A$.

$$ \begin{array}{r | c c} \mathcal{U}_{\text{Col}}(\text{Row}) & A & B \\\hline A & 0 & 7 \\\
B & -7 & 0 \end{array} $$

$$ \begin{array}{r | c c} \mathcal{U}_{\text{Col}}(\text{Row}) & B & C \\\hline B & 0 & 7 \\\
C & -7 & 0 \end{array} $$

$$ \begin{array}{r | c c} \mathcal{U}_{\text{Col}}(\text{Row}) & C & A \\\hline C & 0 & 7 \\\
A & -7 & 0 \end{array} $$

That is: both $\mathcal{U}_B(B) > \mathcal{U}_B( C )$ and $\mathcal{U}_C(B) > \mathcal{U}_C( C )$. So Minimal CDT says that, in a decision between $B$ and $C$, $C$ is an impermissible choice, $C \notin \mathcal{P}(\{ B, C \})$. And both $\mathcal{U}_C( C ) > \mathcal{U}_C(A)$ and $\mathcal{U}_A( C ) > \mathcal{U}_A(A)$. So Minimal CDT says that, in a decision between $C$ and $A$, $A$ is an impermissible choice, $A \notin \mathcal{P}(\{ C, A \})$.

Now, consider what it is permissible to choose from the full menu of options, $\{ A, B, C \}$. By No Dilemmas, some option on this menu is permissible. Suppose it is permissible to choose $A$, $A \in \mathcal{P}(\{ A, B, C \})$. Then, $A \notin \mathcal{P}(\{ C, A \})$ and $A \in \mathcal{P}(\{ C, A, B \})$. This violates IIA. Suppose, on the other hand, that it is permissible to choose $B$, $B \in \mathcal{P}(\{ A, B, C \})$. Then, $B \notin \mathcal{P}(\{ A, B \})$ and $B \in \mathcal{P}(\{ A, B, C \})$. And this violates IIA. Suppose, finally, that it is permissible to choose $C$, $C \in \mathcal{P}(\{ A, B, C \})$. Then, $C \notin \mathcal{P}(\{ B, C \})$ and $B \in \mathcal{P}(\{ B, C, A \})$. Again, this violates IIA. So: whichever of $A, B,$ and $C$ is permissible, there will be a violation of IIA.

So: if we assume Minimal CDT and No Dilemmas, then we will have violations of IIA.


2019, Mar 28

Teaching Arrow’s Impossibility Theorem

I regularly teach undergrads about Arrow’s impossibility theorem. In previous years, I’ve simply presented a statement of the theorem and provided a proof in the optional readings. Arrow’s proof is rather complicated; and while there are several simpler presentations of the proof, they are still too complicated for me to cover with philosophy undergraduates.

Preparing for class this year, I realized that, if Arrow’s theorem is slightly weakened, we can give a proof that is much easier to follow—the kind of proof I’m comfortable presenting to undergraduate philosophy majors. The point of the post today is to present that proof.

1. Stage Setting

Suppose that we have three voters, and they are voting on three options: $A, B,$ and $C$. The first voter prefers $A$ to $B$ to $C$. The second prefers $B$ to $C$ to $A$. The third prefers $C$ to $A$ to $B$. We can represent this with the following table.

$$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & A & B & C \\\
2nd & B & C & A \\\
3rd & C & A & B \end{array} $$

This table gives us a voter profile. In general, a voter profile is an indexed set of preference orderings, which I’ll denote with ‘$ \succeq_i$’. (By the way, I’ll assume that, once we have a weak preference ordering $X \succeq_i Y$—read as “$Y$ is not preferred to $X$”—we can define up a strong preference ordering $X \succ_i Y$—read as “X is preferred to Y”—and an indifference relation $X \sim_i Y$—read as “$X$ and $Y$ are preferred equally”. We can accomplish this with the following stipulative defintions: $X \succ_i Y := X \succeq_i Y \wedge Y \not\succeq_i X$ and $X \sim_i Y := X \succeq_i Y \wedge Y \succeq_i X$.)

A social welfare function is a function from a voter profile, $\succeq_i$, to a group preference ordering, which I’ll denote with ‘$\succeq$’.

There are several ways of interpreting a social welfare function. If you think that an individual’s well-being is a function of how well satisfied their preferences are, and you think that how good things are overall is just a question of aggregating the well-being of all the individuals (this thesis is called welfarism), then you could think of the social welfare function as providing you with a betterness ordering. Alternatively, you could understand the social welfare function as a voting rule which tells you how to select between options, given the preferences of the voters. (For ease of exposition, I’ll run with this second interpretation throughout, though nothing hangs on this choice.)

Here are some features you might want a social welfare function to have: firstly, you don’t want it to privilege any option over any other. It should be the preferences of the voters which determines which option comes out on top and not the way those options happen to be labeled. So, if we were to re-label the options (holding fixed their position in every voter’s preference ordering), the group preference ordering determined by the social welfare function should be exactly the same—except, of course, that the options have now been re-labeled. Call this feature “Neutrality”.

Neutrality Re-labeling options does not affect where options end up in the group preference ordering.

Similarly, we don’t want the social welfare function to privilege any particular voter over any other. All voters should be treated equally. So, if we were to re-label the voters (holding fixed their preferences), this shouldn’t make any difference with respect to the group preference ordering. Let’s call this feature “Anonymity”.

Anonymity Re-labeling voters does not affect the group preference ordering.

Next: if all voters have exactly the same preference ordering, then this should become the group preference ordering. Let’s call this feature “Unanimity”.

Unanimity If all voters share the same preference ordering, then this is the group preference ordering.

And: if the only change to a voter profile is that one person has raised an option, $X$, in their individual preference ordering, this should not lead to $X$ being lowered in the group preference ordering. Let’s call this feature “Monotonicity”.

Monotonicity If one voter raises $X$ in their preference ordering, and nothing else about the voter profile changes, then $X$ is not lowered in the group preference ordering.

Finally, it would be nice if, in order to determine whether $X \succeq Y$, the social welfare function only had to consider each voter’s preferences between $X$ and $Y$. It shouldn’t have to consider where they rank options other than $X$ and $Y$—when it comes to deciding the group preference between $X$ and $Y$, those other options are irrelevant alternatives. Call this principle, then, the “Independence of Irrelevant Alternatives”, or just “IIA”.

Independence of Irrelevant Alternatives (IIA) How the group ranks $X$ and $Y$—i.e., whether $X \succeq Y$ and $Y \succeq X$—is determined entirely by each individual voter’s preferences between $X$ and $Y$. Changes in voters’ preferences which do not affect whether $X \succeq_i Y$ or $Y \succeq_i X$ do not affect whether $X \succeq Y$ or $Y \succeq X$.

What Arrow showed was that there is no social welfare function which satisfies all of these criteria. Actually, Arrow showed something slightly stronger—namely that there’s no social welfare function which satisfies Unanimity, Monotonicity, and IIA other than a dictatorial social welfare function. A dictatorial social welfare function just takes some voter’s preferences and makes them the group’s preferences, no matter the preferences of the other voters. Any dictatorial social welfare function will violate Anonymity, so our weaker impossibility result follows from Arrow’s. While this result is slightly weaker, Anonymity and Neutrality are still incredibly weak principles, and this result is much easier to prove.

2. The Proof

Here’s the general shape of the proof: we will assume that there is some social welfare function which satisfies Anonymity, Neutrality, Unanimity, and IIA, and, by reasoning about what this function must say about particular voter profiles, we will show that it must violate Monotonicity. This will show us that there is no voter profile which satisfies all of these criteria.

Let’s begin with the voter profile from above: $$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & A & B & C \\\
2nd & B & C & A \\\
3rd & C & A & B \end{array} $$ Notice that the three options, $A$, $B$, and $C$, are perfectly symmetric in this voter profile. By re-labeling voters, we could have $C$ appear wherever $A$ does, $B$ appear wherever $C$ does, and $A$ appear wherever $B$ does. For instance: re-label voter 1 “voter 2”, re-label voter 2 “voter 3”, and re-label voter 3 “voter 1”, and you get the following voter profile, in which $A$ has taken the place of $B$, $B$ has taken the place of $C$, and $C$ has taken the place of $A$. $$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & C & A & B \\\
2nd & A & B & C \\\
3rd & B & C & A \end{array} $$ By Anonymity, this makes no difference with respect to the group ordering. Note also that we may view this new voter profile as the result of re-labeling, not the voters, but rather the options (replacing $A$ with $C$, $B$ with $A$, and $C$ with $B$). Then, by Neutrality, after this re-labeling, $A$ must occupy the place of $B$ in the old group ordering, $B$ must occupy the place of $C$ in the old group ordering, and $C$ must occupy the place of $A$. Since the group ordering must also be unchanged (because of Anonymity), this means that the group ordering must be: $$ A \sim B \sim C $$ That is: the group must be indifferent between $A$, $B$, and $C$. (Call this “result #1”) This is exactly what we should expect, given the symmetry of the voter profile. There’s nothing that any option has to raise it above the others.

Now, suppose that, in our original voter profile, voters 1 and 3 change their minds, and they raise $B$ above $A$ in their preference ordering. And suppose that voter 2 raises $C$ above $B$ in their preference ordering. Then, the voter profile would change as shown: $$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & A & B & C \\\
2nd & B & C & A \\\
3rd & C & A & B \end{array} \qquad \Longrightarrow \qquad \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & B & C & C \\\
2nd & A & B & B \\\
3rd & C & A & A \end{array} $$ Notice first that these changes didn’t affect any voter’s ranking between $A$ and $C$. Voter 1 prefers $A$ to $C$ both before and after the changes. And voters 2 and 3 prefer $C$ to $A$ both before and after the changes. Since $A \sim C$ before the changes (by result #1), IIA tells us that, after the changes, it is still the case that $A \sim C$. (Call this “result #2”.)

Notice also that everybody now ranks $B$ above $A$. So, from this voter profile, we could reach a unanimous voter profile in which everybody ranks $B$ above $A$ above $C$, by just having voters 2 and 3 lower $C$ to the bottom of their preference ranking. $$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & B & C & C \\\
2nd & A & B & B \\\
3rd & C & A & A \end{array} \qquad \Longrightarrow \qquad \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & B & B & B \\\
2nd & A & A & A \\\
3rd & C & C & C \end{array} $$ By Unanimity, in the voter profile on the right, $B \succ A$. But, in moving from the voter profile on the left to the one on the right, we didn’t change anybody’s ranking of $A$ and $B$, so, by IIA, $B \succ A$ in the voter profile on the left, too. (Call this “result #3”)

Putting together result #2 and result #3, we have that, in this voter profile, $$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & B & C & C \\\
2nd & A & B & B \\\
3rd & C & A & A \end{array} $$ $B \succ A \sim C$. Therefore, in this voter profile, $B \succ C$. Call this “result #4”.

Suppose that we begin with the voter profile immediately above, and voters 1 and 3 change their minds, raising $A$ above $B$, and leaving everything else unchanged. This gives us the voter profile on the right. $$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & B & C & C \\\
2nd & A & B & B \\\
3rd & C & A & A \end{array} \qquad \Longrightarrow \qquad \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & A & C & C \\\
2nd & B & B & A \\\
3rd & C & A & B \end{array} $$ This change does not affect anyone’s ranking of $B$ and $C$. Voter 1 prefers $B$ to $C$ both before and after the change. And voters 2 and 3 prefer $C$ to $B$ both before and after the change. Since $B \succ C$, given the voter profile on the left (this was result #4), we must have $B \succ C$ on the right, too. Call this “result #5”.

Now, watch this: consider these two voter profiles.
$$ \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & A & B & C \\\
2nd & B & C & A \\\
3rd & C & A & B \end{array} \qquad \Longrightarrow \qquad \begin{array}{l | c c c} \text{Voter #} & 1 & 2 & 3 \\\hline 1st & A & C & C \\\
2nd & B & B & A \\\
3rd & C & A & B \end{array} $$ The voter profile on the left is just our original voter profile. On the right is the voter profile from the right-hand-side of the paragraph immediately above. Result #1 tells us that, on the left, $B \sim C$. Result #5 tell us that, on the right, $B \succ C$. But notice that the only difference between the voter profile on the left and the one on the right is that voter 2 has raised $C$ in their preference ordering. Monotonicity tells us that this shouldn’t lower $C$ in the group preference ordering. So result #1 and result #5 together contradict Monotonicity.

So: any social welfare function which satisfies Anonymity, Neutrality, Unanimity, and IIA will end up violating Monotonicity. So there is no social welfare function which satisfies all of these criteria.