Notes to Jury Theorems

1. A probability space is a mathematical representation of what could happen, and with what probability. Formally, it is a structure \((\Omega,\mathcal{E},P)\) with the following components. Firstly, \(\Omega\) is a non-empty set of (possible) worlds. Secondly, \(\mathcal{E}\) is a set of events \(E\subseteq\Omega\), representing propositions (such as the defendant is guilty or individual \(3\) votes correctly or the majority votes correctly in the group of size \(9\)). \(\mathcal{E}\) might contain all subsets of \(\Omega\); in practice it contains at least those subsets which represent interesting propositions. Technically, \(\mathcal{E}\) must be a \(\sigma\)-algebra: it contains the tautology (\(\Omega\in\mathcal{E}\)), is closed under negation (\(E\in\mathcal{E}\Rightarrow\Omega\backslash E\in\mathcal{E}\)), and is closed under countable disjunction (\(E_{1},E_{2},\ldots\in\mathcal{E}\Rightarrow E_{1}\cup E_{2}\cup\dots\in\mathcal{E}\)). Thirdly, \(P\) is a probability measure on \(\mathcal{E}\), i.e., a function assigning a probability \(P(E)\in\lbrack0,1]\) to each event \(E\in\mathcal{E}\), subject to the probability axioms: \(P(\Omega)=1\) and, for all disjoint events \(E_{1},E_{2},\ldots\in\mathcal{E}\), \(P(E_{1}\cup E_{2}\cup\dots)=P(E_{1})+P(E_{2})+\cdots\).

2. Formally, \(\Maj_{n}=\cup_{I\subseteq \{1,\ldots,n\}:\left\vert I\right\vert >n/2}\cap_{i\in I}R_{i}\), the event that, for some subset \(I\) containing a majority, all members of \(I\) are correct.

3. In the literature, the term “Condorcet jury theorem” sometimes refers to slightly different theorems, or to a vague type of theorem. The most basic rendition is chosen here.

4. Formal models often define the state as an alternative (the “correct” one), so that the votes and the state all take values in the same set (the set of alternatives) and \(R_{i}\) is the event that \(i\)’s vote equals the state. This makes the model more parsimonious. But philosophically it lumps together different objects. In a highly explicit but unparsimonious model of the court’s decision between convict and acquit, the state is neither the correct alternative, nor guilty or innocent, but a complex fact about the defendant’s actions. Here the state takes values in a large set of possible complex facts, and the function mapping each possible state value to the corresponding correct attentive is many-to-one.

5. The growing-reliability conclusion is restricted to bodies of odd size: the probability of majority correctness \(P(\Maj_{n})\) increases as group size \(n\) moves from 1 to 3, to 5, etc. Excluding even-sized groups is necessary because there can be ties in such groups, reducing the probability of a correct majority. For instance, \(P(\Maj_{n})\) falls from \(n=1\) to \(n=2\), because majority correctness in the 1-member group only requires individual 1 to be correct (the event \(R_{1}\)) whereas majority correctness in the 2-member group requires two individuals to be correct (the event \(R_{1}\cap R_{2}\)). The restriction to odd \(n\) could be lifted if ties are broken by an independent toss of a fair coin.

6. For instance, in Condorcet’s Jury Theorem, \(P(\Maj_{n})\) grows strictly except if individuals are infallible, i.e., if \(P(R_{i})=1\).

7. For general \(\mathbf{x}\), the probability of an event \(A\) conditional on \(\mathbf{x}\), \(P(A|\mathbf{x})\), remains definable (see standard probability textbooks). But \(P(A|\mathbf{x})\) is a non-unique function of \(\mathbf{x}\) if some or all values \(x\) of \(\mathbf{x}\) have probability \(P(x)=0\). Still any two versions of \(P(A|\mathbf{x})\) are essentially identical: they coincide outside a zero-probability event; equivalently, they coincide except if \(\mathbf{x}\) takes a value in some zero-probability set of values. To restate CI and CC generally, after “any value \(x\) of the facts \(\mathbf{x}\)” add “except possibly from a zero-probability set of values”. Later in Section 2.5, to restate TC generally, after “is the same for all individuals \(i\)” add “almost surely, i.e., except possibly in a zero-probability event” (the generalized definition of “tending to exceed \(\frac{1}{2}\)” is given in Dietrich & Spiekermann 2013a).

8. The remark in footnote 5 applies analogously.

9. The remark in footnote 5 applies analogously.

10. Formally,

\[\lim_{n\rightarrow\infty}Pr(\Maj_{n})=Pr\left( \mathbf{p}>\frac{1}{2}\right) +\frac{1}{2}Pr\left( \mathbf{p}=\frac{1}{2}\right),\]

where \(\mathbf{p}\) denotes specific competence, \(Pr(R_{i}|\mathbf{x})\), which does not depend on \(i\) by TC.

11. Asymptotic infallibility follows by the “unless” clause. Recall that “unless CC holds” means “if and only if CC does not hold”.

12. That is, infinite average competence exceeds \(\frac{1}{2}\). Infinite average competence is the limit of finite average competence,

\[\lim_{n\rightarrow \infty}\frac{1}{n}\sum_{i=1}^n P(R_{i}),\]

or in full generally (since this limit need not exist) the “limit inferior” or “limiting lower bound” of finite average competence,

\[\lim_{m\rightarrow\infty}\inf_{n\geq m}\frac{1}{n}\sum_{i=1}^n P(R_{i}).\]

A limit inferior always exists, and coincides with the ordinary limit when existent.

13. To model this, consider a large finite population \(\mathcal{I}\) of potential members, e.g., all living scientists in case of a scientific group. For each possible group size \(n\) (at most the population size \(\left\vert \mathcal{I}\right\vert\)), the \(n\)-member group is drawn randomly from the set of possible \(n\)-member groups \(\{N\subseteq\mathcal{I}:\left\vert N\right\vert =n\}\) following a uniform probability distribution. So decisions are doubly random: member judgments and member identity are both random. This restores the growing-reliability conclusion, despite competence heterogeneity (Berend & Sapir 2005). The finiteness of the population \(\mathcal{I}\) reflects reality (and makes the mentioned uniform distribution well-defined); but it sets an upper bound on group size, thereby excluding asymptotic considerations.

14. In a more restricted sense, the “state” consists only of the unknown determinants of correctness.

15. More precisely, assume all individual judgments are functions of a variable \(\mathbf{t}\), “total influences”, interpretable as the vector of all private or shared causes of judgments. Underdetermination is the event that \(\mathbf{t}\) takes a value conditional on which more than one state has non-zero probability. Possible Underdetermination means that underdetermination has positive probability (“possible” thus stands for probabilistic, not just logical, possibility).

16. The infinite sequence of private evidence determines the truth with probability one, by the law of large numbers.

17. As equation (1) shows, an individual \(i\)’s general competence \(P(R_{i})\) is the probability-weighted average of their specific competence \(P(R_{i}|x)\) across facts \(x\). This average can exceeds \(\frac{1}{2}\) even if \(P(R_{i}|x)<\frac{1}{2}\) for some \(x\) of sufficiently low probability.

18. Deliberation lets individuals causally affect one another. Despite causation between individuals, there is normally no causation between votes, at least if voting is silent (cf. Section 2.2). This is why deliberation introduces common causes of, not causation between, votes.

19. A group that has deliberated usually has more common causes of votes than a group that has not. So one may want to conditionalize on more facts than one would have in the absence of deliberation. This points towards a general issue: which sort of facts variable \(\mathbf{x}\) makes the independence premise CI of a jury theorem plausible depends on several concrete circumstances, including whether and which deliberation happens prior to voting. Fortunately, the Conditional and Competence-Sensitive Jury Theorems hold for any facts variable \(\mathbf{x}\).

20. One has to replace the majority threshold by an acceptance threshold such that pivotal voters prefer to tip the decision in the direction of their private belief, rather than ignoring private information. So, clever institutional design might prevent strategic voting. Yet there are obstacles. One obstacle is that the particular acceptance threshold that induces truthful voting is highly sensitive to parameters such as voters’ prior correctness probabilities and strength of private information. Without knowing these parameters, the threshold cannot be set accordingly. Worse, the acceptance threshold that induces truthful voting may vary from voter to voter (for instance due to different prior beliefs), so that no acceptance threshold simultaneously prevents all voters from strategising.

21. Formally, \(F\) is a function on \(\cup_{n=1,2,\ldots} \mathcal{A}^{n}\).

22. Ties, in which both alternatives occur \(n/2\) times among \(a_{1},\ldots,a_{n}\) (for even \(n\)), could be handled in different ways. Either some tie-breaking rule selects a winning alternative. Or one allows outcomes to be sets, in which case either both alternatives are winners (\(F(a_{1},\ldots,a_{n})=\mathcal{A}\)) or no alternative is a winner (\(F(a_{1} ,\ldots,a_{n})=\varnothing\)).

23. For instance, voters might submit more complex inputs than the collective output, e.g., rankings or “approval sets” over possible outputs.

24. This is because expected decision value reduces to correctness probability if \(\mathcal{A}=\mathcal{S}\) and moreover the value \(V(a,s)\) is 1 for correct decisions (\(a=s\)) and 0 for incorrect decisions (\(a\neq s\)).

25. Plurality rule selects the alternative receiving maximally many votes. Ties could be broken in different ways, as for majority rule (see footnote 22).

26. I.e., each \(\mathbf{v}_{i}\) has a conditional expectation equal to \(\mathbf{s}\).

27. More precisely, the assumptions (i)–(iii) guarantee state-conditional convergence of \(F(\mathbf{v}_{1},\ldots,\mathbf{v}_{n})\) to \(\mathbf{s}\) by applying the Law of Large Numbers state-by-state, which then implies unconditional convergence by taking the expectation across states.

28. More precisely, the expected collective decision value is asymptotically maximal, under various plausible (non-simple) standards of correctness, i.e., under various plausible definitions of decision value. For instance, decision value could be negative absolute distance to the truth: \(V(a,s)=-\left\vert a-s\right\vert\). The simple standard of correctness (cf. Section 5.1) would be inappropriate here, because it treats all deviations from the truth—small or large—as equally bad. Indeed, for continuously distributed input variables \(\mathbf{v}_{1},\mathbf{v}_{2},\ldots\), the mere probability of (exact) collective correctness is zero, however much the group is increased.

29. This follows from other statements of the Law of Large Numbers.

30. This holds conditional on any state, and thus unconditionally.

31. It is set aside here what happens in the absence of any implication.

32. A purely conclusion-oriented standard of correctness can be captured by a value function \(V\) assigning to a judgment-set/state pair \((J,s)\) the value 1 or 0 depending on whether or not the judgment on the conclusion proposition \(q\) is correct, i.e., whether or not \(q\in J\Leftrightarrow q\in s\). A refined standard of correctness that also cares about correctness on the premise propositions \(p\) and \(p\rightarrow q\) can be captured by a value function \(V\) that is also sensitive to correctness on premise propositions. For instance, \(V(J,s)\) could be defined as \(\left\vert J\cap s\right\vert\), the number of true judgments; this correctness standard cares equally about correctness on the conclusion and correctness on each premise proposition.

33. Here an alternative is collectively correct if and only if the number of group members for whom it is individually correct is maximal. The collectively correct alternative need not be unique: just imagine two alternatives are each individually correct for exactly \(n/2\) members, which can happen for even group size \(n\).

34. Consider a distributional choice between keeping the status-quo distribution and taking all goods from the poorest citizen and redistributing them equally between everyone else. The redistribution seems incorrect according to egalitarian and utilitarian standards; but it benefits all but one individual, and hence counts as collectively correct if collective correctness means correctness for most individuals (and if moreover individual correctness is given by personal interest). The problem is the neglect of how much someone benefits. Degrees matter.

35. Unconventionally, the inputs of aggregation are not alternatives \(a_{1},\ldots,a_{n}\), i.e., not individual estimates of what is individually best, but entire value functions \(W_{1},\ldots,W_{n}\), i.e., individual estimates of the true individual value functions \(V_{1},\ldots,V_{n}\), respectively. So, everyone submits their judgment of how individually valuable or correct each alternative is. Let the submitted estimates be aggregated additively; the result \(W_{1}+\cdots+W_{n}\) is an estimate of the true collective value function \(V_{1}+\cdots+V_{n}\). The group then chooses an alternative maximising \(W_{1}+\cdots+W_{n}\). Under some probabilistic assumptions about the quality and independence of individual estimates (and a finiteness assumption on \(\mathcal{A}\)), the probability that the collective choice is optimal, i.e., maximises true collective value, tends to one as \(n\) grows (Pivato 2016).

36. The class of solutions may be empty—the case of impossibility theorems like Arrow’s Theorem. Here axioms are mutually inconsistent and must be relaxed. The class may be singleton—the ideal case. Here a single procedure emerges as admissible, and the theorist has finished her work. Or the class may contain many procedures—the underdetermination case. Here stronger axioms are needed for choosing a procedure.

Copyright © 2021 by
Franz Dietrich <fd@franzdietrich.net>
Kai Spiekermann <k.spiekermann@lse.ac.uk>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free