Supplement to Chance versus Randomness
C. Proofs of Selected Theorems
Proof of Theorem 1. This follows from Church’s observation by a result of Doob (1936) or a similar result of Wald (1938). Doob’s result is that the measure of the set of infinite sequences which can result from the application of a admissible place selection \(\phi\) on a member of some subset \(S\) of the Cantor space is the same as the measure of \(S\) (i.e., \(\mu(\phi^{-1}(S)) = \mu(S))\) (see also Feller 1950: 203). Since the measure of the set of Borel abnormal sequences \(A\) is zero, Doob’s theorem shows that the measure of \(\phi^{-1}(A)\) is zero—i.e., that the set of sequences such that an admissible place selection gives a biased subsequence has measure zero. Since for von Mises only a non-random sequence has biased admissible subsequences, the set of non-random sequences is the set of all sequences such that for some place selection, the result is a biased subsequence, so is the union of \(\phi^{-1}(A)\) for all \(\phi\). If there are only countably many \(\phi\) (as there are on Church’s conception of place selections), since the measure of the union of countably many sets is less than or equal to the sum of their measures, the measure of the non-random sequences is zero, and the von Mises-random sequences therefore exist and form a measure one subset of all infinite binary sequences. See van Lambalgen (1987a: §2.5.2).
Proof of Corollary 1. (This proof is due to Chris Porter.) To begin, suppose that the place selection \(f_{\sigma}(x_1\ldots x_{i-1}) = 1\) iff there exists a \(j \lt i\) such that \(\sigma = x_j \ldots x_i\). That is, on input some initial segment of sequence \(x\) of length \(i-1\), this place selection relative to \(\sigma\) selects the \(i\)-th digit of \(x\) iff \(\sigma\) is a suffix of the input. It is easy to see that, for each finite binary sequence \(\sigma , f_{\sigma}\) will be effectively computable, so it is a plausible assumption that \(\{f_{\sigma}: \sigma \in X^{\lt \omega}\}\) is a subset of the set of admissible place selections. It can then be shown (by induction over the length of \(\sigma)\) that any sequence which satisfies the property of large numbers, and all of its admissible subsequences do too, will be Borel normal.
- The base case, for the empty string \(\varnothing\), is guaranteed by the requirement that the sequence \(x\) satisfies the property of large numbers, as every outcome place is preceded by an initial subsequence which has \(\varnothing\) as a suffix.
- Next suppose that every string of length \(n\) appears in \(x\) with limit frequency \(2^{-n}\). Then for each \(\sigma\) of length \(n\), \(f_{\sigma}\) will select from \(x\) a subsequence that satisfies the property of large numbers, as roughly half the time, when \(\sigma\) appears in \(x\), it will be followed by a 0, and the other half of the time it will be followed by a 1. Since this holds for any \(\sigma\) of length \(n\), it follows that \(\sigma 0\) and \(\sigma 1\), the possible strings of length \(n+1\), all occur with limit frequency \(\frac{1}{2} \cdot 2^{-n} = 2^{-(n+1)}\), as required.
Proof of Theorem 3. Every rejected sequence is rejected at some level. So every rejected sequence has some initial sequence which is rejected at that level. To be rejected at a level \(2^{-m}\) is to fall into a set of sequences \(U_m\) which has measure less than the size of the level, by definition. A test is just a collection of such sets \(U_m\); it is clear that if a sequence is rejected at a level, it is rejected at every smaller level, so \(U_m \supseteq U_{m+1}\). It is effectively determinable whether a given sequence is rejected; since the set of pairs \(\langle m,n\rangle\) is effectively enumerable (using Cantor’s pairing function), if a sequence should be rejected, there is an effective procedure which establishes that. These two results show that the set of rejected sequences is effective measure zero: each test is a sequence of effective open sets of sequences. To be rejected is to be a member of some such set. So the set of all rejected sequences is in the intersection of all the test sets \(U_m\). By construction, the rejected sequences, by a particular test, are thus of effective measure zero.
Martin-Löf then shows that the set of all tests is itself effectively enumerable. Using the fact that a test is an enumerable set of sets, and Martin-Löf explicitly constructs a function that assigns a test to a given triple of numbers (which, in effect, encodes the test), and shows that the set \(T\) of such triples is effectively enumerable. The universal test \(U\) is obtained by fixing on some constant, \(c\), for each test \(V\), and letting the universal test be obtained from the set \(T\), using the triple which is the level of significance, the length of the sequence, and the constant (which may as well be the Gödel number of the test \(V)\). The construction ensures that for any test \(V, V_{m+c} \subseteq U_m\); that is, the universal test is one that rejects any sequence at a given level iff there is some test that rejects it at a related level (which level depending on the test). The details of the construction (Martin-Löf 1966: 606, 609–10) show that the universal test also rejects only an effective measure zero set of sequences, so the complement of the rejected sequences—the set of ML-random sequences—is effective measure one by means of a single universal property.
Proof of Theorem 4. The proof is a fairly immediate corollary of the existence of a enumeration of the partial recursive functions. Each function \(g\) has an associated code number (Gödel number) \(\gamma\). Let \(\alpha^{[\gamma]}\) represent the string consisting of a block of \(\gamma\) instances of the digit \(\alpha\). We now choose an \(f\) that, on input \(1^{[\gamma]}0\delta\), produces \(g(\delta)\). Such functions exist, because of the existence of universal Turing machines (let \(\gamma\) be the Gödel number of \(g\) under some standard coding, then a universal Turing machine given this input may emulate the operation of \(g\) on input \(\delta)\). It is obvious that they are near-superior to any \(g\), for \(c_f (\delta) = c_g (\delta) + (\gamma +1)\). Any universal function will be able to emulate any other, which also shows that they are all complexity equivalent to one another.
Proof of Theorem 5. Suppose there was an algorithm \(h\) that on input \(i\) yields a random sequence \(\varrho\) such that \(\lvert\varrho\rvert = i\). \(\varrho\) is random, so \(C(\varrho) \approx i\); yet \(h\) is near-inferior to some universal function, so that \(C(\varrho) \le C_h (\varrho) + k\), for some \(k\) depending on \(h\). Let \(\lvert\varrho\rvert\) increase, so \(k\) becomes negligible with respect to \(i\); then \(i \le C_h (\varrho) + k \approx C_h (\varrho)\). But \(C_h (\varrho) \approx \log_2 i\) (because that is the length of the binary representation of \(i)\); it follows that \(i \lesssim \log_2 i\), contradiction. So there can be no such algorithm \(h\) (van Lambalgen 1995: 11).
Proof of Theorem 6. We show that, for fixed \(k\). If \(\mu\) is sufficiently long then there is an initial segment \(\sigma\) of \(\mu\) such that \(C(\sigma) \lt \lvert\sigma\rvert - k\).
Suppose that we have some sequence \(\mu\), with initial segment \(\nu\), such that \(\nu\) is the \(n\)-th string in the length-lexicographic ordering of the finite binary strings. Let \(\varrho\) be the next \(n\) digits of \(\mu\), and let \(\sigma = \nu\unicode{x2040}\varrho\). We can generate \(\sigma\) from \(\varrho\) alone, because the length of \(\varrho\) will give us \(\nu\) using the standard ordering via some constant decoder. So we know that \(C(\sigma) \approx C(\varrho) \le \lvert\varrho\rvert + c\). We also know, however, that \(\lvert\sigma\rvert = \lvert\nu\rvert + \lvert\varrho\rvert\). If \(\nu\) is chosen to be longer than the sum of the constants (i.e., \(\lvert\nu\rvert \gt c + k)\), it’s obvious that \(C(\sigma) \lt \lvert\sigma\rvert - k\).
Proof of Theorem 7. Only if: The set \(U_k\) consists of those sequences which have finite initial subsequences which are compressible by \(k\) bits. That is, \(U_k = \cup_{\sigma \in \overline{R}} N(\sigma)\), where \(\sigma \in \overline{R}\) iff \(K(\sigma) \le \lvert\sigma\rvert - k\). The set of \(k\)-compressible strings is recursively enumerable; on input \(n \in \mathbb{N}\), take the universal prefix-free decompression function \(u\), and check if \(\lvert u(n)\rvert - \lvert n\rvert \lt k\). So each \(U_k\) is effective open; indeed, this algorithm shows the \(U_k\) to be uniformly effective open (supplement C).
Recall that as a proportion of all finite strings, there are at most \(1/2^k\) compressible strings (§2.2.1). Since there are fewer prefix-free \(k\)-compressible sequences than plain \(k\)-compressible sequences, the upper bound still holds for prefix-free compressibility. As this holds independently of the length of sequences, we can use it to establish a measure for \(U_k : \mu(U_k) = 2^{-k}\). So \(\mu(U_k) \le 2^{-k}\). So \(\cap_{k}U_{k}\) is effective measure zero, and is a Martin-Löf test. Suppose \(x\) is ML-random; thus \(x \not\in \cap_{k}U_{k}\). Therefore there is a \(k\) such that \(x\) has no initial sequence which is compressible by \(k\) bits, and is therefore prefix-free Kolmogorov random.
If: The ‘if’ direction is a little trickier, and depends in part on the details of how the sequential significance tests for ML-randomness are defined. For details, see Li and Vitányi (2008: 221–3) (as well as Chaitin 1987). The general idea is to show that for a sequence \(\omega\) which is not ML-random, and fails some sequential test, then there is a way of constructing a Turing machine that will compress initial subsequences of \(\omega\) arbitrarily much as the length of the initial subsequence increases (i.e., the difference between \(n\) and the prefix-free complexity of the length \(n\) initial subsequence of \(\omega\) is positively unbounded).