## Notes to Jury Theorems

1.
A *probability space* is a mathematical representation of what
could happen, and with what probability. Formally, it is a structure
\((\Omega,\mathcal{E},P)\) with the following components. Firstly,
\(\Omega\) is a non-empty set of *(possible) worlds*. Secondly,
\(\mathcal{E}\) is a set of *events* \(E\subseteq\Omega\),
representing propositions (such as *the defendant is guilty* or
*individual* \(3\) *votes correctly* or *the majority
votes correctly in the group of size* \(9\)). \(\mathcal{E}\)
might contain *all* subsets of \(\Omega\); in practice it
contains at least those subsets which represent interesting
propositions. Technically, \(\mathcal{E}\) must be a
\(\sigma\)-algebra: it contains the tautology
(\(\Omega\in\mathcal{E}\)), is closed under negation
(\(E\in\mathcal{E}\Rightarrow\Omega\backslash E\in\mathcal{E}\)), and
is closed under countable disjunction
(\(E_{1},E_{2},\ldots\in\mathcal{E}\Rightarrow E_{1}\cup
E_{2}\cup\dots\in\mathcal{E}\)). Thirdly, \(P\) is a probability
measure on \(\mathcal{E}\), i.e., a function assigning a probability
\(P(E)\in\lbrack0,1]\) to each event \(E\in\mathcal{E}\), subject to
the probability axioms: \(P(\Omega)=1\) and, for all disjoint events
\(E_{1},E_{2},\ldots\in\mathcal{E}\), \(P(E_{1}\cup
E_{2}\cup\dots)=P(E_{1})+P(E_{2})+\cdots\).

2.
Formally, \(\Maj_{n}=\cup_{I\subseteq \{1,\ldots,n\}:\left\vert
I\right\vert >n/2}\cap_{i\in I}R_{i}\), the event that, for
*some* subset \(I\) containing a majority, *all* members
of \(I\) are correct.

3. In the literature, the term “Condorcet jury theorem” sometimes refers to slightly different theorems, or to a vague type of theorem. The most basic rendition is chosen here.

4.
Formal models often define the state as an alternative (the
“correct” one), so that the votes and the state all take
values in the same set (the set of alternatives) and \(R_{i}\) is the
event that \(i\)’s vote equals the state. This makes the model
more parsimonious. But philosophically it lumps together different
objects. In a highly explicit but unparsimonious model of the
court’s decision between *convict* and *acquit*,
the state is neither the correct alternative, nor *guilty* or
*innocent*, but a complex fact about the defendant’s
actions. Here the state takes values in a large set of possible
complex facts, and the function mapping each possible state value to
the corresponding correct attentive is many-to-one.

5.
The growing-reliability conclusion is restricted to bodies of odd
size: the probability of majority correctness \(P(\Maj_{n})\)
increases as group size \(n\) moves from 1 to 3, to 5, etc. Excluding
even-sized groups is necessary because there can be ties in such
groups, reducing the probability of a correct majority. For instance,
\(P(\Maj_{n})\) *falls* from \(n=1\) to \(n=2\), because
majority correctness in the 1-member group only requires individual 1
to be correct (the event \(R_{1}\)) whereas majority correctness in
the 2-member group requires two individuals to be correct (the event
\(R_{1}\cap R_{2}\)). The restriction to odd \(n\) could be lifted if
ties are broken by an independent toss of a fair coin.

6.
For instance, in Condorcet’s Jury Theorem, \(P(\Maj_{n})\)
grows strictly *except* if individuals are infallible, i.e., if
\(P(R_{i})=1\).

7.
For general \(\mathbf{x}\), the probability of an event \(A\)
conditional on \(\mathbf{x}\), \(P(A|\mathbf{x})\), remains definable
(see standard probability textbooks). But \(P(A|\mathbf{x})\) is a
*non-unique* function of \(\mathbf{x}\) if some or all values
\(x\) of \(\mathbf{x}\) have probability \(P(x)=0\). Still any two
versions of \(P(A|\mathbf{x})\) are essentially identical: they
coincide outside a zero-probability event; equivalently, they coincide
except if \(\mathbf{x}\) takes a value in some zero-probability set of
values. To restate CI and CC generally, after “any value \(x\)
of the facts \(\mathbf{x}\)” add “except possibly from a
zero-probability set of values”. Later in
Section 2.5,
to restate TC generally, after “is the same for all individuals
\(i\)” add “almost surely, i.e., except possibly in a
zero-probability event” (the generalized definition of
“tending to exceed \(\frac{1}{2}\)” is given in Dietrich
& Spiekermann 2013a).

8. The remark in footnote 5 applies analogously.

9. The remark in footnote 5 applies analogously.

10. Formally,

\[\lim_{n\rightarrow\infty}Pr(\Maj_{n})=Pr\left( \mathbf{p}>\frac{1}{2}\right) +\frac{1}{2}Pr\left( \mathbf{p}=\frac{1}{2}\right),\]where \(\mathbf{p}\) denotes specific competence, \(Pr(R_{i}|\mathbf{x})\), which does not depend on \(i\) by TC.

11.
Asymptotic infallibility follows by the “unless” clause.
Recall that “unless CC holds” means “if *and only
if* CC does not hold”.

12.
That is, *infinite average competence* exceeds
\(\frac{1}{2}\). Infinite average competence is the limit of finite
average competence,

or in full generally (since this limit need not exist) the “limit inferior” or “limiting lower bound” of finite average competence,

\[\lim_{m\rightarrow\infty}\inf_{n\geq m}\frac{1}{n}\sum_{i=1}^n P(R_{i}).\]A limit inferior always exists, and coincides with the ordinary limit when existent.

13. To model this, consider a large finite population \(\mathcal{I}\) of potential members, e.g., all living scientists in case of a scientific group. For each possible group size \(n\) (at most the population size \(\left\vert \mathcal{I}\right\vert\)), the \(n\)-member group is drawn randomly from the set of possible \(n\)-member groups \(\{N\subseteq\mathcal{I}:\left\vert N\right\vert =n\}\) following a uniform probability distribution. So decisions are doubly random: member judgments and member identity are both random. This restores the growing-reliability conclusion, despite competence heterogeneity (Berend & Sapir 2005). The finiteness of the population \(\mathcal{I}\) reflects reality (and makes the mentioned uniform distribution well-defined); but it sets an upper bound on group size, thereby excluding asymptotic considerations.

14.
In a more restricted sense, the “state” consists only of
the *unknown* determinants of correctness.

15.
More precisely, assume all individual judgments are functions of a
variable \(\mathbf{t}\), “total influences”, interpretable
as the vector of all private or shared causes of judgments.
Underdetermination is the event that \(\mathbf{t}\) takes a value
conditional on which more than one state has non-zero probability.
*Possible Underdetermination* means that underdetermination has
positive probability (“possible” thus stands for
probabilistic, not just logical, possibility).

16. The infinite sequence of private evidence determines the truth with probability one, by the law of large numbers.

17. As equation (1) shows, an individual \(i\)’s general competence \(P(R_{i})\) is the probability-weighted average of their specific competence \(P(R_{i}|x)\) across facts \(x\). This average can exceeds \(\frac{1}{2}\) even if \(P(R_{i}|x)<\frac{1}{2}\) for some \(x\) of sufficiently low probability.

18.
Deliberation lets individuals causally affect one another. Despite
causation between individuals, there is normally no causation between
*votes*, at least if voting is silent (cf.
Section 2.2).
This is why deliberation introduces common causes of, not causation
between, votes.

19.
A group that has deliberated usually has more common causes of votes
than a group that has not. So one may want to conditionalize on more
facts than one would have in the absence of deliberation. This points
towards a general issue: which sort of facts variable \(\mathbf{x}\)
makes the independence premise CI of a jury theorem plausible depends
on several concrete circumstances, including whether and which
deliberation happens prior to voting. Fortunately, the Conditional and
Competence-Sensitive Jury Theorems hold for *any* facts
variable \(\mathbf{x}\).

20.
One has to replace the majority threshold by an acceptance threshold
such that pivotal voters prefer to tip the decision in the direction
of their private belief, rather than ignoring private information. So,
clever institutional design might prevent strategic voting. Yet there
are obstacles. One obstacle is that the particular acceptance
threshold that induces truthful voting is highly sensitive to
parameters such as voters’ prior correctness probabilities and
strength of private information. Without knowing these parameters, the
threshold cannot be set accordingly. Worse, the acceptance threshold
that induces truthful voting may vary from voter to voter (for
instance due to different prior beliefs), so that *no*
acceptance threshold simultaneously prevents all voters from
strategising.

21. Formally, \(F\) is a function on \(\cup_{n=1,2,\ldots} \mathcal{A}^{n}\).

22. Ties, in which both alternatives occur \(n/2\) times among \(a_{1},\ldots,a_{n}\) (for even \(n\)), could be handled in different ways. Either some tie-breaking rule selects a winning alternative. Or one allows outcomes to be sets, in which case either both alternatives are winners (\(F(a_{1},\ldots,a_{n})=\mathcal{A}\)) or no alternative is a winner (\(F(a_{1} ,\ldots,a_{n})=\varnothing\)).

23. For instance, voters might submit more complex inputs than the collective output, e.g., rankings or “approval sets” over possible outputs.

24. This is because expected decision value reduces to correctness probability if \(\mathcal{A}=\mathcal{S}\) and moreover the value \(V(a,s)\) is 1 for correct decisions (\(a=s\)) and 0 for incorrect decisions (\(a\neq s\)).

25. Plurality rule selects the alternative receiving maximally many votes. Ties could be broken in different ways, as for majority rule (see footnote 22).

26. I.e., each \(\mathbf{v}_{i}\) has a conditional expectation equal to \(\mathbf{s}\).

27. More precisely, the assumptions (i)–(iii) guarantee state-conditional convergence of \(F(\mathbf{v}_{1},\ldots,\mathbf{v}_{n})\) to \(\mathbf{s}\) by applying the Law of Large Numbers state-by-state, which then implies unconditional convergence by taking the expectation across states.

28. More precisely, the expected collective decision value is asymptotically maximal, under various plausible (non-simple) standards of correctness, i.e., under various plausible definitions of decision value. For instance, decision value could be negative absolute distance to the truth: \(V(a,s)=-\left\vert a-s\right\vert\). The simple standard of correctness (cf. Section 5.1) would be inappropriate here, because it treats all deviations from the truth—small or large—as equally bad. Indeed, for continuously distributed input variables \(\mathbf{v}_{1},\mathbf{v}_{2},\ldots\), the mere probability of (exact) collective correctness is zero, however much the group is increased.

29. This follows from other statements of the Law of Large Numbers.

30. This holds conditional on any state, and thus unconditionally.

31. It is set aside here what happens in the absence of any implication.

32. A purely conclusion-oriented standard of correctness can be captured by a value function \(V\) assigning to a judgment-set/state pair \((J,s)\) the value 1 or 0 depending on whether or not the judgment on the conclusion proposition \(q\) is correct, i.e., whether or not \(q\in J\Leftrightarrow q\in s\). A refined standard of correctness that also cares about correctness on the premise propositions \(p\) and \(p\rightarrow q\) can be captured by a value function \(V\) that is also sensitive to correctness on premise propositions. For instance, \(V(J,s)\) could be defined as \(\left\vert J\cap s\right\vert\), the number of true judgments; this correctness standard cares equally about correctness on the conclusion and correctness on each premise proposition.

33. Here an alternative is collectively correct if and only if the number of group members for whom it is individually correct is maximal. The collectively correct alternative need not be unique: just imagine two alternatives are each individually correct for exactly \(n/2\) members, which can happen for even group size \(n\).

34.
Consider a distributional choice between keeping the status-quo
distribution and taking all goods from the poorest citizen and
redistributing them equally between everyone else. The redistribution
seems incorrect according to egalitarian and utilitarian standards;
but it benefits all but one individual, and hence counts as
collectively correct if collective correctness means correctness for
*most* individuals (and if moreover individual correctness is
given by personal interest). The problem is the neglect of *how
much* someone benefits. Degrees matter.

35. Unconventionally, the inputs of aggregation are not alternatives \(a_{1},\ldots,a_{n}\), i.e., not individual estimates of what is individually best, but entire value functions \(W_{1},\ldots,W_{n}\), i.e., individual estimates of the true individual value functions \(V_{1},\ldots,V_{n}\), respectively. So, everyone submits their judgment of how individually valuable or correct each alternative is. Let the submitted estimates be aggregated additively; the result \(W_{1}+\cdots+W_{n}\) is an estimate of the true collective value function \(V_{1}+\cdots+V_{n}\). The group then chooses an alternative maximising \(W_{1}+\cdots+W_{n}\). Under some probabilistic assumptions about the quality and independence of individual estimates (and a finiteness assumption on \(\mathcal{A}\)), the probability that the collective choice is optimal, i.e., maximises true collective value, tends to one as \(n\) grows (Pivato 2016).

36. The class of solutions may be empty—the case of impossibility theorems like Arrow’s Theorem. Here axioms are mutually inconsistent and must be relaxed. The class may be singleton—the ideal case. Here a single procedure emerges as admissible, and the theorist has finished her work. Or the class may contain many procedures—the underdetermination case. Here stronger axioms are needed for choosing a procedure.