Supplement to Inductive Logic
Proof of the Probabilistic Refutation Theorem
The proof of Convergence Theorem 2 requires the introduction of one more concept, that of the variance in the quality of information for a sequence of experiments or observations, \(\VQI[c^n \pmid h_i /h_j \pmid b]\). The quality of the information QI from a specific outcome sequence \(e^n\) may vary somewhat from the expected quality of information for conditions \(c^n\). A common statistical measure of how widely individual values tend to vary from an expected value is given by the expected squared distance from the expected value, which is called the variance.
For \(h_j\) outcome-compatible with \(h_i\) on \(c_k\), define \[ \begin{multline} \VQI[c_k \pmid h_i /h_j \pmid b] =\\ \sum_u (\QI[o_{ku} \pmid h_i /h_j \pmid b\cdot c_k] - \EQI[c_k \pmid h_i /h_j \pmid b])^2 \\ \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]. \end{multline} \]
For a sequence \(c^n\) of observations on which \(h_j\) is outcome-compatible with \(h_i\), define
\[ \begin{multline} \VQI[c^n \pmid h_i /h_j \pmid b] =\\ \sum_{\{e^n\}} (\QI[e^n \pmid h_i /h_j \pmid b\cdot c^n] - \EQI[c^n \pmid h_i /h_j \pmid b])^2 \\ \times P[e^n \pmid h_{i}\cdot b\cdot c^{n}]. \end{multline} \]Clearly VQI will be positive unless \(h_i\) and \(h_j\) agree on the likelihoods of all possible outcome sequences in the evidence stream, in which case both \(\EQI[c^n \pmid h_i /h_j \pmid b]\) and \(\VQI[c^n \pmid h_i /h_j \pmid b]\) equal 0.
When both Independent Evidence Conditions hold, \(\VQI[c^n \pmid h_i /h_j \pmid b]\) decompose into the sum of the VQI for individual experiments or observations \(c_k\).
Theorem: The VQI Decomposition Theorem for Independent
Evidence on Each Hypothesis:
Suppose both condition independence and
result-independence hold. Then
For the Proof, we employ the following abbreviations:
\[ \begin{align} \Q[e_k] &= \QI[e_k \pmid h_i /h_j \pmid b\cdot c_k] \\ \Q[e^k] &= \QI[e^k \pmid h_i /h_j \pmid b\cdot c^k] \\ \E[c_k] &= \EQI[c_k \pmid h_i /h_j \pmid b] \\ \E[c^k] &= \EQI[c^k \pmid h_i /h_j \pmid b] \\ \V[c_k] &= \VQI[c_k \pmid h_i /h_j \pmid b] \\ \V[c^k] &= \VQI[c^k \pmid h_i /h_j \pmid b] \\ \end{align} \]The equation stated by the theorem may be derived as follows:
\( \begin{align} \V[c^n] &= \sum_{\{e^n\}} (\Q[e^n] - \E[c^n])^2 \times P[e^n \pmid h_{i}\cdot b\cdot c^{n}] \\ \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \sum_{\{e^n\}} ((\Q[e_n]+\Q[e^{n-1}]) - (\E[c_n]+\E[c^{n-1}]))^2 \\ & \qquad\qquad \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ & \qquad\qquad \times P[e^{n-1} \pmid h_i\cdot b\cdot c^{n-1}] \\[2ex] \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \sum_{\{e^{n-1}\} } \sum_{\{e_n\}} ((\Q[e_n]-\E[c_n]) + (\Q[e^{n-1}]-\E[c^{n-1}]))^2\\ & \qquad\qquad \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\times P[e^{n-1} \pmid h_i\cdot b\cdot c^{n-1}] \\[2ex] \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \sum_{\{e^{n-1}\} } \sum_{\{e_n\}} \left(\begin{split} &(\Q[e_n]-E[c_n])^2 + (\Q[e^{n-1}]-\E[c^{n-1}])^2 \\ &+ 2 \times(\Q[e_n]-\E[c_n])\times(\Q[e^{n-1}]-\E[c^{n-1}]) \end{split} \right)\\ &\qquad\qquad \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\times P[e^{n-1} \pmid h_i\cdot b\cdot c^{n-1}] \\[2ex] \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \sum_{\{e^{n-1}\} } \sum_{\{e_n\}} (\Q[e_n]-\E[c_n])^2 \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ & \qquad\quad\qquad\times P[e^{n-1} \pmid h_i\cdot b\cdot c^{n-1}]\\ & \qquad\quad + \sum_{\{e^{n-1}\} } \sum_{\{e_n\}}(\Q[e^{n-1}]-\E[c^{n-1}])^2 \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ & \qquad\quad\qquad \times P[e^{n-1} \pmid h_i\cdot b\cdot c^{n-1}] \\ & \qquad\quad + \sum_{\{e^{n-1}\} } \sum_{\{e_n\} } 2\times(\Q[e_n]-\E[c_n])\cdot(\Q[e^{n-1}]-\E[c^{n-1}]) \\ & \qquad\quad\qquad \times P[e_n \pmid h_{i}\cdot b\cdot c_n] \times P[e^{n-1} \pmid h_i\cdot b\cdot c^{n-1}] \\[2ex] \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \V[c_n] + \V[c^{n-1}] \\ & \qquad\qquad + 2\times \sum_{\{e^{n-1}\} } \sum_{\{e_n\}} \left(\begin{aligned} \Q[e_n]\times\Q[e^{n-1}] \\ - \Q[e_n]\times\E[c^{n-1}] \\ - \E[c_n]\times\Q[e^{n-1}] \\ + \E[c_n]\times\E[c^{n-1}] \end{aligned}\right) \\ & \qquad\qquad \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\times P[e^{n-1} \pmid h_i\cdot b\times c^{n-1}] \\[2ex] \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \V[c_n] + \V[c^{n-1}] \\ & \qquad + 2\times \left( \begin{aligned} \sum_{\{e^{n-1}\} } \sum_{\{e_n\}} \Q[e_n] & \times\Q[e^{n-1}]\\ &\times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ &\times P[e^{n-1} \pmid h_i\cdot b \times c^{n-1}] \\ - \sum_{\{e^{n-1}\} } \sum_{\{e_n\}} \Q[e_n] & \times\E[c^{n-1}]\\ & \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ & \times P[e^{n-1} \pmid h_i\cdot b\times c^{n-1}] \\ -\sum_{\{e^{n-1}\} } \sum_{\{e_n\}} \E[c_n] & \times\Q[e^{n-1}]\\ & \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ & \times P[e^{n-1} \pmid h_i\cdot b\times c^{n-1}] \\ +\sum_{\{e^{n-1}\} } \sum_{\{e_n\}} \E[c_n] & \times\E[c^{n-1}] \\ & \times P[e_n \pmid h_{i}\cdot b\cdot c_n]\\ & \times P[e^{n-1} \pmid h_i\cdot b\times c^{n-1}] \end{aligned}\right) \\ \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \V[c_n] + \V[c^{n-1}] \\ & \qquad\qquad +2 \times \left(\begin{aligned} \E[c_n]\times\E[c^{n-1}] \\ - \E[c_n]\times\E[c^{n-1}] \\ - \E[c_n]\times\E[c^{n-1}] \\ + \E[c_n]\times\E[c^{n-1}] \end{aligned}\right) \\[2ex] \end{align}\)
\(\begin{align}\phantom{\V[c^n]} &= \V[c_n] + \V[c^{n-1}] \\ &= \ldots\\ &= \sum^{n}_{k = 1} \VQI[c_k \pmid h_i /h_j \pmid b]. \end{align} \)
By averaging the values of \(\VQI[c^n \pmid h_i /h_j \pmid b]\) over the number of observations n we obtain a measure of the average variance in the quality of the information due to \(c^n\). We represent this average by overlining ‘VQI’.
Definition: The Average Variance in the Quality of Information
\[ \bVQI[c^n \pmid h_i /h_j \pmid b] = \frac{\VQI[c^n \pmid h_i /h_j \pmid b]}{n}. \]We are now in a position to state a very general version of the second part of the Likelihood Ratio Convergence Theorem. It applies to all evidence streams not containing possibly falsifying outcomes for \(h_j\). That is, it applies to all evidence streams for which \(h_j\) is fully outcome-compatible with \(h_i\) on each \(c_k\) in the evidence stream. This theorem is essentially a specialized version of Chebyshev’s Theorem, which is a Weak Law of Large Numbers.
Likelihood Ratio Convergence Theorem 2*—The
Probabilistic Refutation Theorem.
Suppose the evidence stream \(c^n\) contains only experiments or
observations on which \(h_j\) is fully outcome-compatible
with \(h_i\)—i.e., suppose that for each condition \(c_k\) in
sequence \(c^n\), for each of its possible outcomes possible outcomes
\(o_{ku}\), either \(P[o_{ku}\pmid h_{i}\cdot b\cdot c_{k}] = 0\) or
\(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\). And suppose that
the Independent Evidence Conditions hold for evidence stream
\(c^n\) with respect to each of these hypotheses. Now, choose any
positive \(\varepsilon \lt 1\), as small as you like, but large enough
(for the number of observations n being contemplated) that the
value of \(\bEQI[c^n \pmid h_i /h_j \pmid b] \gt -\frac{(\log
\varepsilon)}{n}\). Then:
Thus, provided that the average expected quality of the information, \(\bEQI[c^n \pmid h_i /h_j \pmid b]\), for the stream of experiments and observations\( c^n\) doesn’t get too small (as n increases), and provided that the average variance, \(\bVQI[c^n \pmid h_i /h_j \pmid b]\), doesn’t blow up (e.g., it is bounded above), hypothesis \(h_i\) (together with \(b\cdot c^n\)) says it is highly likely that outcomes of \(c^n\) will be such as to make the likelihood ratio against \(h_j\) as compared to \(h_i\) as small as you like, as n increases.
Proof: Let
\[ \begin{align} \V & = \VQI[c^n \pmid h_i /h_j \pmid b] \\ \E & = \EQI[c^n \pmid h_i /h_j \pmid b] \\ \Q[e^n] & = \QI[e^n \pmid h_i /h_j \pmid b\cdot c^n] \\ & = \log\left(\frac{P[e^n \pmid h_{i}\cdot b\cdot c^n]}{P[e^n \pmid h_j\cdot b\cdot c^{n}]}\right) \end{align} \]Choose any small \(\varepsilon \gt 0\), and suppose (for n large enough) that \(\E \gt -(\log \varepsilon)/n\). Then we have
\[ \begin{align} V & = \sum_{\{e^n : P[e^n \pmid h_{j}\cdot b\cdot c^{n}] \gt 0\}} (\E - \Q)^2 \times P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\\ & \ge \sum_{\substack{\{e^n : P[e^n \pmid h_{j}\cdot b\cdot c^{n}] \gt 0\\ \amp\ \Q[e^n] \le -(\log \varepsilon)\}}} (\E - \Q)^2 \times P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\\ & \ge (\E + (\log \varepsilon))^2 \times \sum_{\substack{\{e^n : P[e^{n}h_j\cdot b\cdot c^{n}] \gt 0\\ \amp\ \Q[e^n] \le -(\log \varepsilon)\}}} P[e^n \pmid h_{i}\cdot b\cdot c^{n}]\\ & = (\E + (\log \varepsilon))^2 \times P\left[ \vee \left\{ \begin{aligned} e^n : P[e^n \pmid h_{j}\cdot b\cdot c^{n}] \\ \gt 0 \amp \Q[e^n]\\ \le\log(1/\varepsilon) \end{aligned} \right\} \pmid h_{i}\cdot b\cdot c^{n} \right]\\ & = (\E + (\log \varepsilon))^2 \times P\left[\vee \left\{e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^n]}{P[e^n \pmid h_i\cdot b\cdot c^n]} \ge \varepsilon \right\} \pmid h_i\cdot b\cdot c^{n}\right] \end{align} \]So,
\[ \begin{multline} \frac{\bV}{n\times\left(\bE + \frac{\log \varepsilon}{n}\right)^2} = \frac{V}{(\E + (\log \varepsilon))^2}\\ \ge P\left[\vee \left\{e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^n]}{P[e^n \pmid h_i\cdot b\cdot c^n]} \ge \varepsilon \right\} \pmid h_i\cdot b\cdot c^{n}\right]\\ = 1 - P\left[\vee \left\{e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^n]}{P[e^n \pmid h_i\cdot b\cdot c^n]} \lt \varepsilon \right\} \pmid h_i\cdot b\cdot c^{n}\right] \end{multline} \]Thus, for any small \(\varepsilon \gt 0\),
\[ \begin{multline} P\left[\vee \left\{e^n : \frac{P[e^n \pmid h_{j}\cdot b\cdot c^n]}{P[e^n \pmid h_i\cdot b\cdot c^n]} \lt \varepsilon \right\} \pmid h_i\cdot b\cdot c^{n}\right]\\ \ge 1 - \frac{\bV}{n\times\left(\bE + \frac{(\log \varepsilon)}{n}\right)^2} \end{multline} \](End of Proof)
This theorem shows that when \(\bVQI\) is bounded above and \(\bEQI\) has a positive lower bound, a sufficiently long stream of evidence will very likely result in the refutation of false competitors of a true hypothesis. We can show that \(\bVQI\) will indeed be bounded above when a very simple condition is satisfied. This gives us the version of the theorem stated in the main text.
Likelihood Ratio Convergence Theorem 2—The Probabilistic
Refutation Theorem.
Suppose the evidence stream \(c^n\) contains only experiments or
observations on which \(h_j\) is fully outcome-compatible
with \(h_i\)—i.e., suppose that for each condition \(c_k\) in
sequence \(c^n\), for each of its possible outcomes possible outcomes
\(o_{ku}\), either \(P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] = 0\) or
\(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\). In addition (as a
slight strengthening of the previous supposition), for some \(\gamma
\gt 0\) a number smaller than \(1/e^2 (\approx .135\); where
‘e’ is the base of the natural logarithm), suppose
that for each possible outcome \(o_{ku}\) of each observation
condition \(c_k\) in \(c^n\), either \(P[o_{ku} \pmid h_{i}\cdot
b\cdot c_{k}] = 0\) or \(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] /
P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \ge \gamma\). And suppose that
the Independent Evidence Conditions hold for evidence stream
\(c^n\) with respect to each of these hypotheses. Now, choose any
positive \(\varepsilon \lt 1\), as small as you like, but large enough
(for the number of observations n being contemplated) that the
value of \(\bEQI[c^n \pmid h_i /h_j \pmid b] \gt -(\log
\varepsilon)/n\). Then:
Proof: This follows from Theorem 2* together with the following observation:
If for each \(c_k\) in \(c^n\), for each of its possible outcomes \(o_{ku}\), either \(P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] = 0\) or \(P[o_{ku} \pmid h_{j}\cdot b\cdot c_k]/P[o_{ku} \pmid h_i\cdot b\cdot c_{k}] \ge \gamma \gt 0\), for some lower bound \(\gamma \lt 1/e^2 (\approx .135\); where ‘e’ is the base of the natural logarithm), then \(\bV = \bVQI[c^n \pmid h_i /h_j \pmid b] \le (\log \gamma)^2\).
To see that this observation holds, assume its antecedent.
- First notice that when \(0 \lt P[e_k \pmid h_{j}\cdot b\cdot c_k]
\lt P[e_k \pmid h_i\cdot b\cdot c_{k}]\) we have
\[
\begin{multline}
\left(\log\left[\frac{P[e_k \pmid h_{i}\cdot b\cdot c_k]}{P[e_k \pmid h_j\cdot b\cdot c_k]}\right]\right)^2 \times P[e_k \pmid h_i\cdot b\cdot c_{k}]\\
\le (\log \gamma)^2 \times P[e_k \pmid h_{i}\cdot b\cdot c_{k}].
\end{multline}
\]
So we only need establish that when \(P[e_k \pmid h_{j}\cdot b\cdot c_k] \gt P[e_k \pmid h_i\cdot b\cdot c_{k}] \gt 0\), we will also have this relationship—i.e., we will also have
\[ \begin{multline} \left(\log\left[\frac{P[e_k \pmid h_{i}\cdot b\cdot c_k]}{P[e_k \pmid h_j\cdot b\cdot c_k]}\right]\right)^2 \times P[e_k \pmid h_i\cdot b\cdot c_{k}]\\ \le (\log \gamma)^2 \times P[e_k \pmid h_{i}\cdot b\cdot c_{k}]. \end{multline} \](Then it will follow easily that \(\bVQI[c^n \pmid h_i /h_j \pmid b] \le(\log \gamma)^2\), and we’ll be done.)
- To establish the needed relationship, suppose that \(P[e_k \pmid
h_{j}\cdot b\cdot c_{k}] \gt P[e_k \pmid h_{i}\cdot b\cdot c_{k}] \gt
0\). Notice that for all \(p \le q\), p and q between 0
and 1, the function \(g(p) = (\log(p/q))^2 \times p\) has a minimum at
\(p = q\), where \(g(p) = 0\), and (for \(p \lt q)\) has a maximum
value at \(p = q/e^2\)—i.e., at \(p/q = 1/e^2\). (To get this,
take the derivative of \(g(p)\) with respect to p and set it
equal to 0; this gives a maximum for \(g(p)\) at \(p = q/e^2\).)
So, for \(0 \lt P[e_k \pmid h_{i}\cdot b\cdot c_k] \lt P[e_k \pmid h_j\cdot b\cdot c_{k}]\) we have
\[ \begin{multline} \left(\log\left(\frac{P[e_k \pmid h_{i}\cdot b\cdot c_k]}{P[e_k \pmid h_j\cdot b\cdot c_k]}\right)\right)^2 \times P[e_k \pmid h_i\cdot b\cdot c_{k}]\\ \le (\log(1/e^2))^2 \times P[e_k \pmid h_{j}\cdot b\cdot c_{k}] \\ \le(\log \gamma)^2 \times P[e_k \pmid h_{j}\cdot b\cdot c_{k}] \end{multline} \](since, for \(\gamma \le 1/e^2\) we have \(\log \gamma \le \log(1/e^2) \lt 0\); so \((\log \gamma)^2 \ge(\log(1/e^2))^2 \gt 0)\).
- Now (assuming the antecedent of the theorem), for each \(c_k\), \[ \begin{align} \VQI[c_k \pmid h_i /h_j \pmid b] & = \sum_{\{o_{ku}: P[o_{ku} \pmid h_j\cdot b\cdot c_k ] \gt 0\}} (\EQI[c_k] - \QI[c_k])^2 \\ &\quad\qquad \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \\[1ex] & = \sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} \left(\begin{aligned} & \EQI[c_k]^2 \\ &- 2\times\QI[c_k]\times\EQI[c_k]\\ & + \QI[c_k]^2 \end{aligned}\right) \\ &\quad\qquad \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \\[1ex] & = \sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} \EQI[c_k]^2\times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \\ & \quad - 2\times\EQI[c_k] \\ &\quad\ \times \sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} \left(\begin{aligned} &\QI[c_k]\\ &\times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \end{aligned}\right) \\ & \quad+\sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} \left(\begin{aligned} &\QI[c_k]^2 \\ &\times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \\ \end{aligned}\right) \\ & = \sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} \QI[c_k]^2 \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}]\\ &\quad - \EQI[c_k]^2 \\ & \le \sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} \QI[c_k]^2 \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \\ & \le \sum_{\{o_{ku}: P[o_{ku} \pmid h_{j}\cdot b\cdot c_{k}] \gt 0\}} (\log \gamma)^2 \times P[o_{ku} \pmid h_{i}\cdot b\cdot c_{k}] \\ & \le (\log \gamma)^2. \\ \end{align} \]
So,
\[ \bVQI[c_k \pmid h_i /h_j \pmid b] = (1/n) \times \sum^{n}_{k = 1} \VQI[c_k \pmid h_i /h_j \pmid b] \le (\log \gamma)^2. \]