2020, Mar 25

Deference and updating

Bas van Fraassen’s principle of Reflection tells you to defer to your future credences. A natural generealization of Christensen’s principle of Rational Reflection tells you to defer to whichever future credence will be rational. Elga’s principle New Rational Reflection is like Christensen’s principle, except that it allows that the rational credences may not be certain that they are rational.

Each of these deference principles is equivalent to a claim about updating—a claim about how your credences should be disposed to change when you learn that some proposition, e, is true. Reflection is equivalent to the claim that you should be disposed to update by conditioning on the proposition that your credences have been updated on e. Rational Reflection is equivalent to the claim that you should be disposed to update by conditioning on the proposition that e is your total evidence. And New Rational Reflection is equivalent to the claim that you should be disposed to update with a Jeffrey shift on the partition of propositions about what your total evidence may be.

1. Learning Dispositions

To set the stage, I’m going to suppose that you’ve got some (prior) credence function $C$, and that there’s some (finite) set of propositions, $\mathscr{E} = \{e, f, g, \dots \}$, such that exactly one of the members of the set $\mathscr{E}$ will be your total evidence. That is: for each $e \in \mathscr{E}$, $e$ might be your total evidence. And your total evidence must be some member of $\mathscr{E}$. Write ‘$\mathbf{T}e$’ for ‘your total evidence is $e$’. Then, note that, since you must learn exactly one of the propositions in the set $\mathscr{E} = \{e, f, g, \dots \}$ the set $\mathbf{T}\mathscr{E} = \{ \mathbf{T}e, \mathbf{T}f, \mathbf{T} g, \dots \}$ will form a partition.

Your learning dispositions are dispositions to respond to each of these possible conditions: the condition of having t/otal evidence $e$, $\mathbf{T}e$, the condition of having total evidence $f$, $\mathbf{T}f$, and so on. I’ll write ‘$D_e$’ for the credence function that you’re disposed to adopt in the condition $\mathbf{T}e$, for each $e \in \mathscr{E}$. If you take your learning dispositions to be perfectly attuned to your potential evidence, then you’ll foresee no possibility of failing to adopt $D_e$ in the condition $\mathbf{T}e$. In that case, we needn’t distinguish the condition of you having $e$ as your total evidence and the condition of you updating on the proposition $e$. But what if you do forsee the possibility of not recognizing that $e$ is your total evidence in the condition $\mathbf{T}e$, or not responding to that total evidence by updating appropriately? What if you don’t take your learning dispositions to be perfectly attuned to your evidence? In that case, we should distinguish the condition of having $e$ as your total evidence and the condition of you updating on the total evidence $e$. I’ll use ‘$\mathbf{U}e$’ (read: you update on $e$) to stand for the condition of you taking $e$ to be your total evidence, and responding accordingly. If $D_e$ is the credence function you’re disposed to adopt when your total evidence is $e$, then $\mathbf{U}e$ says that you have taken your evidence to be $e$ and adopted the credence function $D_e$ in response. If you think that your learning dispositions are not perfectly attuned to your potential evidence, then you’ll think that, for some $e$, $\mathbf{U}e$ could be true, even when $\mathbf{T}e$ is not. Equivalently: you’ll think that, for some $e \neq f$, you might update on $f$ even when your total evidence is $e$.

2. Deference and Updating

2.1 Deference

When you defer to some other, expert, credence function, you use its probabilities to determine your own. If you’re certain of what credence function the expert has, then deference is simple: simply adopt the known expert credence function as your own. Sometimes, however, you don’t know precisely what the expert function’s credences are. In that case, you should use your credences about the expert function to determine your own credences.

Principles of expert deference tell you exactly how your credences about the expert function should determine your own credences. The simplest such expert deference principle says that, conditional on a probability function, $E$, being the expert, your credences should agree with $E$'s. That is:

Immodest Expert Deference You defer to an (immodest) expert iff, for every proposition $p$ and every probability function $E$, $$ C(p \mid E \text{ is the expert }) = E(p), \text{ if defined} $$

(Note: If $E$ is certain to not be the expert, $C(E \text{ is the expert })= 0$, then the conditional probability on the left-hand-side may not be defined; in that case, the principle will impose no constraint. That’s why I’ve written ‘if defined’ above; I’ll leave this proviso implicit in what follows.)

I’ve called this principle Immodest Expert Deference because it implies that the expert is certain to be certain that it is the expert. In the jargon, it entails that the expert is immodest. To see that this follows from the principle, just let $p$ be the proposition that $E$ is the expert. Then, the principle tells us that, for every $E$, $$ C(E \text{ is the expert } \mid E \text{ is the expert }) = E( E \text{ is the expert }) $$ If $C$ is a probability function, then the left-hand-side must equal 1, so, if you are able defer to the expert in the way this principle advises, then it must be that, for every $E$ which might be the expert, $E$ is certain that it is the expert. So you are certain that $E$ is certain that it is the expert.

Suppose you wish to defer to a modest expert: one who isn’t certain that it is the expert. The simplest such principle governing deference to such an expert says that, conditional on a probability function, $E$, being the expert, your credences should agree with $E$'s, once $E$ is conditioned on the proposition that it is the expert.

Expert Deference You defer to an expert iff, for every proposition $p$ and every probability function $E$, $$ C(p \mid E \text{ is the expert }) = E(p \mid E \text{ is the expert }) $$

This principle subsumes Immodest Expert Deference as a special case: whenever $E$ is immodest, $E(p \mid E \text{ is the expert })$ will be equal to $E(p)$. However, it applies even when $E$ may not be certain that it is the expert. If $E$ isn’t certain that it is the expert, then, when you condition your credence function on the proposition that $E$ is the expert, you’re taking something for granted that $E$ itself has not taken for granted. The solution is to have $E$ also take for granted that it is the expert (by conditioning it on the proposition that it is the expert), and only then align your credences with it. That is: the principle tells you to align your conditional credences with the expert’s conditional credences, where the condition is that it is the expert.

(By the way, there are several other options for how defer to experts. At least 12 different formulations have been floated in the literature, and there are often subtle, unexpected differences between them (see this blog post, for instance). But, in the interests of simplicity, I’m just going to focus on these two here.)

2.2 Reflection

Bas van Fraassen’s principle of reflection says that you should treat your future, posterior credence function as an (immodest) expert. That is: for each proposition $p$, given that you update your credences to the posterior function $D$, your credence that $p$ should be $D(p)$.

Reflection Conditional on $D$ being your updated credence function, your credence that $p$ should be $D(p)$ $$ C(p \mid D \text{ is your updated credence }) = D(p) $$

Suppose that you know that your updated credence function will be one of $D_e, D_f, D_g, \dots$. That is, suppose that you know your updated credence function will be one of the members of the set $\{ D_e \mid e \in \mathscr{E} \}$. Then, the claim that $D_e$ is your updated credence is just the claim that you have updated on the proposition $e$, $\mathbf{U}e$. In that case, we can re-write Reflection like this: for every proposition $p$, and every $e \in \mathscr{E}$, $$ C(p \mid \mathbf{U} e) = D_e(p) $$

Reflection has a straightforward corollary for updating: it entails that you should be disposed to update on $e$ by conditioning on the proposition $\mathbf{U}e$. To see this, instead of thinking of Reflection as a constraint on your prior credence function, $C$, think of it as a constraint on your learning dispositions, $D$. Then, it tells you that, for each $e \in \mathscr{E}$, you should be disposed, upon learning $e$, to adopt a new credence which is equal to your old credence, conditional on $\mathbf{U}e$. $$ D_e(p) = C(p \mid \mathbf{U} e) $$ (To emphasize this shift in perspective—from thinking of Reflection as a constraint on $C$ to thinking of it as a constraint on $D$, I’ve just switched the left- and right-hand sides*.) Obviously, this entailment goes both ways; so **Reflection** is equivalent to the claim that, upon learning $e$, you should be disposed to condition on $\mathbf{U}e$. (In my paper Updating for Externalists, I called this claim ‘update conditionalization’, and I showed that, given some assumptions about accuracy, update conditionalization maximizes evidentially expected accuracy. That is: if you’re an evidential decision theorist, and you want your credences to be as accurate as possible, then you should have the learning dispositions which update conditionalization recommends.)

2.3 Rational Reflection

Christensen’s Rational Reflection principle says that you should treat your current rational credence as an (immodest) expert. That is, conditional on $R$ being the rational credences for you to hold now, your credence in every proposition should be the same as $R$'s credence in that proposition.

Rational Reflection (present) Conditional on $R$ being the rational credence for you to hold, your credence in $p$ should be $R(p)$, for every proposition $p$. $$ C(p \mid R \text{ is rational } ) = R(p) $$

This formulation of Rational Reflection only says that you should defer to your currently rational credences. However, it is meant to apply at any time. So, in particular, we can think about your posterior credences, after you’ve learnt which $e \in \mathscr{E}$ is true. In that case, since you are certain that your total evidence was one of the propositions in $\mathscr{E}$, and you’re certain that $D_e$ is the rational credence function iff $e$ is your total evidence, Rational Reflection (present) says that, for each $e, f \in \mathscr{E}$, \begin{aligned} D_f(p \mid D_e \text{ is rational }) &= D_e(p) \\\
D_f(p \mid \mathbf{T}e) &= D_e(p) \end{aligned}

If you think that you should defer to your currently rational credences, you should also think that you should defer to your future rational credences. So you should also accept the following principle, govnering your prior credences, before you learn which proposition in $\mathscr{E}$ is true: conditional on $D_e$ being the rational credence function for you to hold after learning which $e \in \mathscr{E}$ is true, your credence in $p$ should be $D_e(p)$, for every proposition $p$. Since $D_e$ will be the rational credence function for you to hold iff $e$ is your total evidence, this means that, conditional on $\mathbf{T}e$, your prior credence that $p$ should be $D_e(p)$, for every proposition $p$.

Rational Reflection (future) Conditional on $e$ being your total evidence, your credence that $p$ should be $D_e(p)$. \begin{aligned} C(p \mid D_e \text{ will be rational }) &= D_e(p) \\\
C(p \mid \mathbf{T}e) &= D_e(p) \end{aligned}

Once again, this deference principle implies a corollary about updating. Instead of seeing Rational Reflection (future) as a constraint on your prior credences $C$, think of it as a constraint on your learning dispositions, $D$. Then, it says that you, upon learning $e$, you should be disposed to condition on the proposition $\mathbf{T}e$, $$ D_e(p) = C(p \mid \mathbf{T}e) $$ This update rule has been defended by Matthias Hild and Miriam Schoenfield. In Updating for Externalists, I called it ‘Schoenfield conditionalization’. Again, the reverse entailment also goes through, so the two claims are equivalent. Rational Reflection is equivalent to Schoenfield conditionalization. In Updating for Externalists, I showed that (given some assumptions about accuracy) Schoenfield conditionalization maximizes causal expected accuracy whenever you are certain that you’ll update on a proposition iff that proposition is your total evidence. That is: if you’re certain that $\mathbf{U}e \leftrightarrow \mathbf{T}e$, for each $e \in \mathscr{E}$, you’re a causal decision theorist, and you want your credences to be as accurate as possible, then you should have the learning dispositions which Schoenfield recommends.

2.4 New Rational Reflection

Because Rational Reflection tells you to treat your rational credence as an immodest expert, it entails that your rational credence is certain to be immodest. Suppose that you deny this. Suppose you’ve persuaded by authors like Williamson that you think you can end up rationally uncertain about what your total evidence is. In that case, since which credence function is rational is a function of what your total evidence is, it follows that you can end up rationally uncertain about whether your credences are in fact rational. That is: it can be rational to be less than certain that your credences are rational, even when they in fact are.

In that case, you cannot treat your rational credences as an immodest expert. So Elga recommends that you treat them as a potentially modest expert. That is: he advises that, conditional on $R$ being the rational function for you to adopt, you match your credences to $R$'s, once $R$ is conditioned on the proposition that it is the rational credence function.

New Rational Reflection (present) Conditional on $R$ being the rational credence for you to hold, your credence in $p$ should be $R$'s credence in $p$, after $R$ is conditioned on the proposition that $R$ is the rational credence function for you to hold. $$ C(p \mid R \text{ is rational } ) = R(p \mid R \text{ is rational }) $$

This formulation of New Rational Reflection says only that you should defer in this way to your currently rational credences. If we apply it to you after you’ve learnt which $e \in \mathscr{E}$ is true, then—since you’re certain that $D_e$ is the rational credence function for you to hold iff $\mathbf{T}e$ is true—New Rational Reflection (present) says that, for each $e, f \in \mathscr{E}$, \begin{aligned} D_f(p \mid D_e \text{ is rational }) &= D_e(p \mid D_e \text{ is rational }) \\\
D_f(p \mid \mathbf{T}e) &= D_e(p \mid \mathbf{T}e)
\end{aligned}

As with Rational Reflection, there’s no need to restrict the principle to your currently rational credences. If you should treat your current rational self as an expert, then so too should you treat your future rational self as an expert. So, in particular, before you learn which proposition in $\mathscr{E}$ is true, you should satisfy the following constraint: for each $e \in \mathscr{E}$, conditional on $D_e$ being the rational posterior credence function, your credence in $p$ should be $D_e(p \mid D_e \text{ will be rational })$. Since $D_e$ will be rational iff $e$ is your total evidence, this means that, conditional on $\mathbf{T}e$, your credence that $p$ should be $D_e(p \mid \mathbf{T}e)$.

New Rational Reflection (future) Conditional on $e$ being your total evidence, your credence that $p$ should be $D_e(p \mid \mathbf{T}e)$. \begin{aligned} C(p \mid D_e \text{ will be rational }) &= D_e(p \mid D_e \text{ is rational }) \\\
C(p \mid \mathbf{T}e) &= D_e(p \mid \mathbf{T}e) \end{aligned}

Henceforth, I’ll just call the conjunction of New Rational Reflection (present) and New Rational Reflection (future)New Rational Reflection’.

Again, this deference principle is equivalent to a claim about updating. In this case, the claim is that you should be disposed to update with a Jeffrey shift on the partition $\mathbf{T}\mathscr{E} = \{ \mathbf{T}e, \mathbf{T}f, \mathbf{T}g, \dots \}$. What it is to be disposed to update with a Jeffrey shift on the partition $Q = \{ q_1, q_2, \dots, q_N \}$ is for the following to be true of your learning dispositions: for every $e \in \mathscr{E}$, there is some collection of weights $\lambda_1, \lambda_2, \dots, \lambda_N$ such that $\sum_i \lambda_i = 1$ and, for every propostion $p$, $$ D_e(p) = \sum_i C(p \mid q_i) \cdot \lambda_i $$ Or, equivalently: you are disposed to update with a Jeffrey shift on the partition $Q = \{ q_1, q_2, \dots, q_N \}$ iff, for every proposition $p$, each $e \in \mathscr{E}$, and each $q_i \in Q$, $D_e(p \mid q_i) = C(p \mid q_i)$ (if defined, of course).

Jeffrey Shift You are disposed to update with a Jeffrey shift on the partition $Q = { q_1, q_2, \dots, q_N }$ iff, for each $e \in \mathscr{E}$ and each $q_i \in Q$, $$ D_e(p \mid q_i) = C(p \mid q_i) $$

Therefore, to show that New Rational Reflection requires you to update with a Jeffrey shift on the partition $\mathbf{T}\mathscr{E} = \{\mathbf{T}e \mid e \in \mathscr{E} \}$, we would have to show that it requires that, for each $e, f \in \mathscr{E}$, $$ D_e(p \mid \mathbf{T}f) = C(p \mid \mathbf{T}f) $$

We can show this easily. New Rational Reflection (present) tells us that
$$ D_e(p \mid \mathbf{T}f) = D_f(p \mid \mathbf{T}f)
$$ And New Rational Reflection (future) tells us that $$ C(p \mid \mathbf{T}f) = D_f(p \mid \mathbf{T}f) $$ Putting these two identities together gives us that $$ D_e(p \mid \mathbf{T}f) = C(p \mid \mathbf{T}f) $$ So: New Rational Reflection entails that you should be disposed to update with a Jeffrey Shift on the partition $\mathbf{T}\mathscr{E}$.

In fact, New Rational Reflection is equivalent to the claim that you should be disposed to update with a Jeffrey Shift on the partition $\mathbf{T}\mathscr{E}$. Assume that you are disposed to update with a Jeffrey shift on $\mathbf{T}\mathscr{E}$. Then, for any $e, f \in \mathscr{E}$, $$ (\star) \qquad \qquad D_e(p \mid \mathbf{T}f) = C(p \mid \mathbf{T}f) $$ If we let $f = e$ in ($\star$), then we get New Rational Relection (future): $$ C(p \mid \mathbf{T}e) = D_e(p \mid \mathbf{T}e) $$ If we instead swap $e$ and $f$ in ($\star$), we get: $$ D_f(p \mid \mathbf{T}e) = C(p \mid \mathbf{T}e) $$ And putting these two identities together gives us New Rational Reflection (present): $$ D_f(p \mid \mathbf{T}e) = D_e(p \mid \mathbf{T}e) $$ So: New Rational Reflection is equivalent to the claim that you should be disposed to update with a Jeffrey shift on $\mathbf{T}\mathscr{E}$.

In my paper Updating for Externalists, I pointed out that Schoenfield conditionalization (according to which $D_e(p)$ should be $C(p \mid \mathbf{T}e)$) requires a kind of immodesty that externalists should want to reject. For that rule requires you to always end up certain about what your total evidence is. Being certain about what your total evidence is means being certain about which credence function is rational. But externalists should want to say that you could be less than certain about what your total evidence is, and less than certain about which credence function is the rational one for you to adopt. Since Schoenfield conditionalization is equivalent to Rational Reflection, this was really just a re-hashing of Elga’s argument that externalists should reject Rational Reflection. And, in fact, the alternative update I recommended for externalists is closely related to Elga’s New Rational Reflection.

The alternative rule I recommended for externalists, called ‘externalist conditionalization’, says: $$ D_e(p) = \sum_f C(p \mid \mathbf{T}f) \cdot C(\mathbf{T}f \mid \mathbf{U}e) $$ (Here, I’m summing over the $f \in \mathscr{E}$.) This is a Jeffrey shift on the partition $\mathbf{T}\mathscr{E}$. That is, it is a rule of the form $$ D_e(p) = \sum_f C(p \mid \mathbf{T}f) \cdot \lambda_f $$ where, in the case of externalist conditionalization, $\lambda_f = C(\mathbf{T}f \mid \mathbf{U}e)$.

Another, equivalent, presentation of externalist conditionalization is this: your learning dispositions should be such that:

  1. You are disposed to update with a Jeffrey shift on $\mathbf{T}\mathscr{E}$: that is, for every $e, f \in \mathscr{E}$, $$ D_e(p \mid \mathbf{T}f) = C(p \mid \mathbf{T}f) $$ and
  2. For each $e, f \in \mathscr{E}$, upon learning that $e$, you are disposed to think $\mathbf{T}f$ is as likely as you currently think it is, conditional on your updating on $e$: $$ D_e(\mathbf{T}f) = C(\mathbf{T}f \mid \mathbf{U}e) $$

What we’ve just seen is that (1) is equivalent to New Rational Reflection. So a third, equivalent presentation of externalist conditionalization is this:

Externalist Conditionalization Your learning dispositions should be such that:

  1. They satisfy New Rational Reflection; and
  2. For each $e, f \in \mathscr{E}$, upon learning that $e$, you are disposed to think $\mathbf{T}f$ is as likely as you currently think it is, conditional on your updating on $e$: $$ D_e(\mathbf{T}f) = C(\mathbf{T}f \mid \mathbf{U}e) $$

2017, May 31

Local and Global experts

Contemporary epistemology is replete with principles of expert deference. Epistemologists have claimed that you should treat the chances, your future selves, your rational self, and your epistemic peers as experts. What this means is that you should try to align your credences with theirs.

There are lots of ways you might try to align your credences with those of some expert function. (That expert function could be the chances, or it could be your future credences, or something else altogether. The particular function won’t matter, so I’ll just call the expert function, whatever it is, ‘$\mathscr{E}$’.) My focus here will be on just two ways of aligning your credences with $\mathscr{E}$'s: 1) by treating it as a local expert; and 2) by treating it as a global expert.


Local Expert

You treat $\mathscr{E}$ as a local expert iff, for all propositions $a$, and all numbers $n \in [0, 1]$, $$ C(a \mid \langle \mathscr{E}(a) = n \rangle) = n, \,\, \text{if defined} $$

Global Expert

You treat $\mathscr{E}$ as a global expert iff, for all propositions $a$, and all potential credence functions $E$, $$ C(a \mid \langle \mathscr{E} = E \rangle) = E(a), \,\, \text{ if defined} $$


In these definitions, $C$ is your own credence function. You should read ‘$\mathscr{E}$’ as a definite description, along the lines of ‘the credence function of the expert’. This definite description may refer to different credence functions at different worlds. And I am using the brackets ‘$\langle ,, \rangle$’ to denote propositions. Thus, ‘$\langle \mathscr{E}(a) = n \rangle$’ is the propositions that the expert’s credence that a is n. It is true at those worlds where $\mathscr{E}$'s credence in the proposition $a$ is $n$. And $\langle \mathscr{E} = E \rangle$ is the proposition that $E$ is the expert’s entire credence function, true at those worlds $w$ such that $\mathscr{E}_w = E$ (‘$\mathscr{E}_w$’ is $\mathscr{E}$'s credence function at world $w$).

It’s not immediately obvious what the relationship is between these two different ways of treating a function as an expert. You might think that they are equivalent, in the sense that you will treat $\mathscr{E}$ as a local expert if and only if you treat them as a global expert. In fact, they are not equivalent. Treating $\mathscr{E}$ as a global expert entails treating $\mathscr{E}$ as a local expert, but the converse is not true. (Throughout, by the way, I’m assuming probabilism and I’m assuming that your credences are defined over a finite number of worlds).


Proposition 1

If you treat $\mathscr{E}$ as a global expert, then you treat them as a local expert as well. However, you may treat $\mathscr{E}$ as a local expert without treating them as a global expert.


Proof. Note that $\{ \langle \mathscr{E} = E \rangle \mid E(a) = n \}$ is a partition of $\langle \mathscr{E}(a) = n \rangle$. If you treat $\mathscr{E}$ as a global expert, then for each $E$ such that $E(a) = n$, $C(a \mid \langle \mathscr{E} = E \rangle) = n$. It then follows from conglomerability (which follows from the probability axioms when the number of worlds is finite) that $C(a \mid \langle \mathscr{E}(a) = n \rangle) = n$.

To see that you may treat $\mathscr{E}$ as a local expert without treating them as a global expert, suppose that there are three possible worlds, $w_1$, $w_2$, and $w_3$, and that the expert’s credence function at each of those worlds is as shown below (the example originates from Gaifman’s 1988 article “A Theory of Higher-Order Probabilities”). (In the matrix, by the way, the $i$th row gives $\mathscr{E}$'s credence distribution over $w_1, w_2$ and $w_3$ at the world $w_i$.)

Figure 1

Figure 1

And suppose that your own credence distribution over $w_1, w_2,$ and $w_3$ is such that $C(\{w_i\}) =$ 1/3, for $i = 1, 2, 3$. Then, for every proposition $a$ and every number $n$, $C(a \mid \langle \mathscr{E}(a) = n \rangle) = n$. For instance, if $a = \{ w_1, w_2 \}$ and $n = 0.5$, then

$$ \begin{align} C(\{ w_1, w_2 \} \mid \langle \mathscr{E}(\{ w_1, w_2 \}) = 0.5 \rangle) &= C(\{ w_1, w_2 \} \mid \{ w \mid \mathscr{E}_w(\{ w_1, w_2 \})=0.5 \} ) \\\
&= C(\{ w_1, w_2 \} \mid \{ w_2, w_3 \}) \\\
&= 0.5 \end{align} $$

And the same is true for every other choice of $a$ and $n$, as you may check for yourself. Nevertheless, it is impossible to treat $\mathscr{E}$ as a global expert, since, so long as $C$ is a probability function,

$$ C(\{ w_1 \} \mid \langle \mathscr{E} = \mathscr{E}_{w_1} \rangle) =C(\{ w_1 \} \mid \{ w_1 \}) = 1 $$

But $\mathscr{E}_{w_1}(\{ w_1 \}) = 0.5 \neq 1$. QED.

So a principle of local deference is strictly weaker than a principle of global deference. Or, a perhaps better way of thinking about things: there are strictly more functions which can be treated as local experts than there are functions which can be treated as global experts.

This is a prima facie exciting observation, since a common objection to principles of global deference is that it is possible to treat $\mathscr{E}$ as a global expert if and only if $\mathscr{E}$ is certain of what their own credences are (because the focus is usually on certain ideal credence functions, certainty about your own credences is generally called immodesty). That is to say, it is possible to treat $\mathscr{E}$ as a global expert if and only if they are immodest—if and only if, for every world $w$, $\mathscr{E}_w(\langle \mathscr{E} = \mathscr{E}_w \rangle) = 1$. For suppose that $\mathscr{E}$ were modest—that is, suppose that, for some world $w$, $\mathscr{E}_w(\langle \mathscr{E} = \mathscr{E}_w \rangle) \neq 1$. And suppose that you treat $\mathscr{E}$ as a global expert. Then, substituting $\langle \mathscr{E} = \mathscr{E}_w \rangle$ in for $a$ and $\mathscr{E}_w$ in for $E$ in the definition of Global Expert, we have

$$ C(\langle \mathscr{E} = \mathscr{E}_w \rangle \mid \langle \mathscr{E} = \mathscr{E}_w \rangle ) = \mathscr{E}_w(\langle \mathscr{E} = \mathscr{E}_w \rangle ) \neq 1 $$

But the probability axioms require $C(a \mid a)$ to be 1 (or undefined) for all $a$.

So: if you think functions which aren’t certain of their own values should nevertheless be treated as experts, then you will think that we need a characterization of “treating a function as an expert” which goes beyond Global Expert. A common suggestion is to treat $\mathscr{E}$ as a modest expert.


Modest Expert

You treat $\mathscr{E}$ as a modest expert if and only if, for all propositions $a$ and all potential credence functions $E$, $$ C(a \mid \langle \mathscr{E} = E \rangle) = E(a \mid \langle \mathscr{E} = E \rangle) $$


But perhaps the move to such principles is too hasty. Perhaps we can get by just with principles of local deference. For note that the expert shown in figure 1 is modest; yet they can be treated as a local expert. So there are at least some modest functions which can be treated as local experts. And perhaps these are all the modest experts we need.

For this reason, the relationship between local and global experts is dialectically important to some debates in epistemology. For instance, Christensen endorses the claim that you should treat your currently rational self as a local expert. Elga criticizes this position on the grounds that it requires certainty that you are rational—however, in order to argue for this conclusion, he must first re-present Christensen’s principle as the claim that you should treat your rational self as a global expert (note: Elga recognizes that the second principle is stronger than the first). Perhaps, in the face of these criticisms, Christensen should hold tight to his original principle; perhaps it affords all the modesty we need.

No such luck, I’m afraid. Although there are some functions which can be treated as local experts but not global experts, these functions are incredibly singular. In fact, there is a good sense in which the function shown in figure 1 is the only kind of function which can be treated at a local, but not global, expert.

Given a function $\mathscr{E}$, from worlds to probability distributions over those same worlds, we can generate a Kripke frame $< \mathscr{W}, R >$ from $\mathscr{E}$ as follows: $\mathscr{E}_w(\{ x \}) \neq 0$ if and only if $w$ bears the relation $R$ to $x$ (or, as I shall say, if and only if $w$ sees $x$).

Let’s say that a Kripke frame $< \mathscr{W}, R >$ is cyclic iff

  1. Every world $w \in \mathscr{W}$ sees itself and exactly one other world.
  2. Every world $w \in \mathscr{W}$ is seen by exactly one distinct world.
  3. There are no two worlds $w, x$ such that $w$ sees $x$ and $x$ sees $w$.

A sample cyclic frame is shown below.

Figure 2

Figure 2

Note that the function from figure 1 will generate a cyclic frame in which, for each $w \in \mathscr{W}$, $\mathscr{E}_w(w) =$ 1/2. Let’s call any function like this a uniform cyclic function (‘uniform’ because at every world $\mathscr{E}$ gives equal probability to its actual world and the one other possible world it sees).


Uniform Cyclicity

A function $\mathscr{E}$ is uniform cyclic if and only if $\mathscr{E}$ generates a cyclic frame and, for every $w \in \mathscr{W}$, $\mathscr{E}_w(\{ w \}) =$ 1/2.


Now, it turns out that the functions which may be treated as local experts, but which may not be treated as global experts, are precisely the uniform cyclic ones. If a function is uniform cyclic, then you may treat it as a local expert, but not as a global expert. And if a function $\mathscr{E}$ is not uniform cyclic, then you can treat $\mathscr{E}$ as a local expert if and only if you can treat it as a global expert.


Proposition 2

It is possible to treat a function $\mathscr{E}$ as a local expert but not possible to treat them as a global expert when and only when $\mathscr{E}$ is uniform cyclic.

The only credences which treat such a function as a local expert are those which are uniform over the worlds in each cycle.


The proof of this proposition is quite long and tedious, so I’m putting it in a separate document here.

What Proposition 2 means, I think, is that we don’t have to fret about the difference between the local and global formulations of various principles of expert deference. For what the proposition tells us is that nobody should endorse a principle of local deference without thereby endorsing a principle of global deference. To endorse a principle of local deference without endorsing a principle of global deference is to say that uniformly cyclic functions are deserving of epistemic deference, but no other immodest function is. This strikes me as entirely unmotivated.

If we think that you should treat the probability function which generates the cyclic frame in figure 2 as an expert, then we should also think that you should treat the probability function which generates the frame shown in figure 3 as an expert.

Figure 3

Figure 3

After all, the only difference between the frame in figure 2 and the frame in figure 3 is that we have taken the single possibility $w_1$ in figure 2 and divided it into two sub-possibilities $w_1$ and $w_1'$ in figure 3. We could suppose that, at all worlds in figure 3, $\mathscr{E}$ gives the proposition $\{ w_1, w_1’ \}$ precisely the same probability it gave the singleton proposition $\{ w_1 \}$ in figure 2. If that’s so, then say that $\mathscr{E}$ reduces to uniform cyclicity. After all, if we just collapse the possibilities $w_1$ and $w_1'$, then we get back a uniform cyclic function. The difference between a uniform cyclic function and a function which merely reduces to uniform cyclicity ought not make any difference with respect to whether some supposed expert is deserving of epistemic deference, nor how that deference ought to be shown. However, Proposition 2 assures us that such minor changes in representation do make a difference with respect to whether we can treat the function as a local expert. So, if we’re in for treating some function as a local expert, we shouldn’t demure from treating them as a global expert as well.

So I think that Christensen, e.g., effectively has committed himself to the view that your rational self must be immodest. While his claim that you should treat your rational self as a local expert does not on its own entail this conclusion, it follows with the rather weak assumption that, if a uniform cyclic function is deserving of epistemic deference, then so too is a function which merely reduces to uniform cyclicity. Unless Christensen believes that 1) our rational selves could be uniform cyclic, but 2) they could not merely reduce to uniform cyclicity, he should also think that you should treat your rational self as a global expert. And this entails that your rational self is immodest.

TL;DR: you might have thought that principles of local deference are equivalent to principles of global deference. They’re not. Principles of local deference are weaker than principles of global deference. But they’re really not much weaker—just slightly. And there’s really no good reason to treat any function as a local expert but not a global expert. So, while they’re ever-so-slightly different, really, you shouldn’t ever worry about the differences.

2017, May 3

The Brier Measure is not strictly proper (as epistemologists have come to use that term)

In recent years, formal epistemologists have gotten interested in measures of the accuracy of a credence function. One famous measure of accuracy is the one suggested by Glenn Brier. Given a (finite) set $\Omega =$ { $\omega_1, \omega_2, \dots, \omega_N$ } of possible states of the world, the Brier measure of the accuracy of a credence function $c$ at the state $\omega_i$ is

$$ \mathfrak{B}(c, \omega_i) = - (1-c({ \omega_i }))^2 - \sum_{j \neq i} c({ \omega_j })^2 $$

And formal epistemologists usually say that a measure of accuracy $\mathfrak{A}$ is strictly proper iff every probability function expects itself (and only itself) to have the highest $\mathfrak{A}$-value.


Strict Propriety A measure of accuracy $\mathfrak{A}$ is strictly proper iff, for every probability function $p$ and every credence function $c \neq p$, the $p$-expectation of $p$'s $\frak{A}$-accuracy is strictly greater than the $p$-expectation of $c$'s $\frak{A}$-accuracy. That is: for every probability $p$ and every credence $c \neq p$,

$$ \sum_{i = 1}^N p({ \omega_i }) \cdot \mathfrak{A}(p, \omega_i) > \sum_{i = 1}^N p({ \omega_i }) \cdot \mathfrak{A}(c, \omega_i) $$


(‘Weak propriety’ is the property you get when you swap out ‘$>$’ for ‘$\geq$’.)

The point of today’s post is that, contrary to what I once thought (and perhaps contrary to what some others thought as well—though this could be a confusion localized to my own brain), the Brier score is not strictly proper.

First, a bit of background: Given a (finite) set $\Omega =$ { $\omega_1, \omega_2, \dots, \omega_N$ } of possible states of the world, we can call any set of states in $\Omega$ a ‘proposition’. And I’ll call a set of propositions, $\mathscr{F}$, a ‘field’. Given a pair $(\Omega, \mathscr{F})$, with $\mathscr{F} \subseteq \wp(\Omega)$, a credence function, $c$, is just any function from $\mathscr{F}$ to the unit interval, $[0, 1]$.

A credence function $c$ is a probability function if it additionally satisfies the following two constraints:

  1. $c(\Omega) = 1$.
  2. For all $A, B \in \mathscr{F}$ such that $A \cap B = \emptyset$, $c(A \cup B) = c(A) + c(B)$.

To see that the Brier measure $\mathfrak{B}$ is not strictly proper, consider the set of states $\Omega =$ { $\omega_1, \omega_2$ } and the field $\mathscr{F} =$ { $\emptyset,$ { $\omega_1$ }, { $\omega_2$ }, $\Omega$ }. Then, consider the probabilistic $p$ and the non-probabilistic $c$, both defined over the field $\mathscr{F}$. $$ \begin{array}{r | c c} A \in \mathscr{F} & p(A) & c(A) \\\hline \varnothing & 0 & 1 \\\
\{\omega_1\} & 1/2 & 1/2 \\\
\{\omega_2\} & 1/2 & 1/2 \\\
\Omega & 1 & 0
\end{array} $$

The $p$-expected Brier accuracy of $p$ is

$$ \begin{aligned} \mathbb{E}_p \left[ \mathfrak{B}(p) \right] &= p({ \omega_1 }) \cdot \mathfrak{B}(p, \omega_1) + p({ \omega_2 }) \cdot \mathfrak{B}(p, \omega_2) \\\
&= 1/2 \cdot \left[ -(1-1/2)^2 - (1/2)^2 \right] + 1/2 \cdot \left[ -(1-1/2)^2 - (1/2)^2 \right] \\\
&= - 1/2 \end{aligned} $$

And the $p$-expected Brier accuracy of $c$ is likewise

$$ \begin{aligned} \mathbb{E}_p \left[ \mathfrak{B}(c) \right] &= p({ \omega_1 }) \cdot \mathfrak{B}(c, \omega_1) + p({ \omega_2 }) \cdot \mathfrak{B}(c, \omega_2) \\\
&= 1/2 \cdot \left[ -(1-1/2)^2 - (1/2)^2 \right] + 1/2 \cdot \left[ -(1-1/2)^2 - (1/2)^2 \right] \\\
&= - 1/2 \end{aligned}$$

So there is a probabilistic $p$ and a credence function $c \neq p$ such that $p$ expects $c$ to be just as Brier accurate as $p$ is itself. So the Brier measure of accuracy is not strictly proper.

Some have used the term ‘strict propriety’ differently than I defined it above. In the first place, Brier himself did not intend his measure to apply to credence functions, which are functions from arbitrary propositions to the unit interval, but rather forecasts, which he treated as assignments of real numbers from the unit interval to each individual state $\omega_i \in \Omega$. (Brier even required these numbers to sum to 1.) If you are in a context where you are evaluating, not credence functions, but forecasts, then you might want to define the notion of strict propriety like this:


Strict Propriety for Forecasts

A measure of accuracy $\mathfrak{A}$ is strictly proper for forecasts iff, for every probabilistic forecast $p$ and every forecast $f \neq p$, the $p$-expectation of $p$'s $\frak{A}$-accuracy is strictly greater than the $p$-expectation of $f$'s $\frak{A}$-accuracy. That is: for every probabilistic forecast $p$ and every forecast $f \neq p$,


$$\sum_{i = 1}^N p({\omega_i}) \cdot \mathfrak{A}(p, \omega_i) > \sum_{i = 1}^N p({\omega_i}) \cdot \mathfrak{A}(f, \omega_i) $$

And the Brier measure is strictly proper for forecasts. It’s just not strictly proper as epistemologists have been using that term, applied to arbitrary credence functions.

What is a strictly proper measure of accuracy for credence functions is this quadratic measure, which is also sometimes called the Brier measure (though it’s not the measure Brier himself explicitly endorsed):

$$\mathfrak{Q}(c, \omega) = - \sum_{A \in \mathscr{F}} ( \chi_A(\omega) - c(A) )^2 $$

(Here, ‘$\chi_A(\omega)$’ is the characteristic function for the proposition $A$, which maps a state $\omega$ to $1$ if $A$ is true in that state and $0$ otherwise.)