Notes to Zermelo’s Axiomatization of Set Theory

1. Page number specifications on their own will refer to Zermelo (1908b). The translations of Zermelo are all taken from Zermelo (2010), where the original pagination is preserved and where the original German can be found.

2. As Russell admits:

…the question arises: which propositional functions define classes which are single terms as well as many, and which do not? And with this question our real difficulties begin. (Russell 1903: 103)

3. This is taken from the official English translation.

4. Zermelo uses ‘ε’ to denote set membership rather than now standard ‘∈’.

5. Various things should be noted here. Firstly, Zermelo's argument goes back to 1902 (see Felgner 2010: 167); he had discovered the Russell paradox independently as early as 1900. Secondly, Zermelo's analysis is already present, in effect, in Russell (1903: §100) where ‘the Contradiction’ is discussed explicitly. Thirdly, the Burali-Forti paradox was treated by Jourdain as a reductio, but it wasn't at all clear which premise in its derivation was under attack (see Hallett 1981).

6. See, for example, the principles set out at the beginning of Dedekind 1888, and later the paper published by Harward in 1905 (Harward 1905; Moore 1976).

7. For a review of these principles, see Felgner 2010: 174–175. The relevant correspondence of Cantor can be found in Cantor 1991, with English translations in Ewald 1996: volume 2.

8. See Ebbinghaus 2007, in particular 36–47. See also Peckhaus 1990. Also helpful is the chronology of Zermelo's career in Zermelo 2010: 42–51, and also the introductory section on Zermelo's life and work.

9. Note that, while the notion of consistency itself is stated in syntactic terms, the proofs given in the geometrical work are relative consistency proofs effected by semantic arguments. The demonstration of consistency is in fact by the exhibition [‘Aufweisung’] of what we would now call a model.

10. See also Hilbert's famous paper on mathematical problems from 1900 (Hilbert 1900a: 265–266).

11. Hilbert (1918: 411; 152 of reprint, 1112 of English translation) cites Zermelo's work as an example of solving specific problems by means of the axiomatic method. He repeats this in some lectures from 1920:

Die axiomatische Methode ist durch keine andere ersetzbar. Sie ist und bleibt das unserem Geist angemessene, unentbehrliche Hilfsmittel der exakten Forschung in jedem Gebiete; sie ist fruchtbar und logisch unanfechtbar. Um einen bestimmten Zweig der Wissenschaft zu erforschen, sucht man diesen auf eine möglichst geringe Anzahl von möglichst einfachen und anschaulichen Prinzipien zurückzuführen. Diese Prinzipien heissen Axiome. Nur dadurch gewinnt man eine klare Uebersicht über die ganze Disziplin, wie auch die Möglichkeit ihrer weiteren Entwicklung, dass man ihre Axiome sammelt und als solche aufstellt.

Das glänzendste Beispiel einer vollendeten Ausgestaltung der axiomatischen Methode bietet das betrachtete Axiomensystem von Zermelo. (Hilbert 1920: 33–34.)

12. For an analysis of Hilbert's work of this kind, see Hallett 2008.

13. Two sets M and N are said to have the same size or power or cardinality if they can be put into one-to-one onto correspondence. In this case we say that MN. If M has the same power as a subset of N, we say that MN; if in addition to this MN fails, we say that MN, and that N has greater power than M. A set is said to be countable if it can be put into one-to-one onto correspondence with the set of natural numbers.

14. However, for a survey of other, quite different, attempts to count with infinite numbers, both before Cantor and after, see Mancosu 2009.

15. Two ordered sets A1 with ordering R1 and A2 with ordering R2 are said to be order-isomorphic if there is a function f which puts them into one-to-one onto correspondence and in addition is such that, for any two elements a, b in A1, aR1b if and only if f(a)R2f(b). Thus, f shows that the ordering of A1 is reflected in the ordering of A2.

16. Recall that the phrase ‘laws of thought [Gesetze des Denkens]’ is used by Frege (1879, Vorwort) to refer to logical laws, laws which ‘transcend all particulars [über allen Besonderheiten erhaben]’.

17. The letters can all be found in Cantor 1991, with English translations, and an important and informative introduction, in Ewald 1996: Volume 2, 923–940. For commentary, see Hallett 1984.

18. Hardy (1904: §2) and Jourdain (1905a, among other places) later endorsed a similar ‘selection’ method to show, respectively, that the continuum has a subset of the power of the second number-class and that every cardinal number is an aleph.

19. ‘Component’ was Cantor's original term for a subset.

20. One could in consequence say that Zermelo's criticism reveals a circularity similar to that revealed in many of the “proofs” of the Euclidean Parallel Postulate.

21. Following Cantor, we would put ‘ℵ’ in place of Russell's ‘α’.

22. For an account of König's lecture, its aftermath and Zermelo's rôle and comments, see Ebbinghaus 2007: 50–53.

23. Zermelo attributes to Erhard Schmidt the idea that each covering gives rise to a well-ordering; see Zermelo 1904: 516.

24. For comments on how to move from the informal ‘successive selection’ argument to Zermelo's proof of the WOT, see Hausdorff 1914: 134–136.

25. In later work on the set axioms, however, Zermelo again calls AC a ‘logical principle’. See, for example, Zermelo 1930: 31.

26. The enumeration of such problems is clearly associated with Zermelo's remark in the 1904 paper that the choice principle cannot be reduced to anything simpler, and ‘wird aber in der mathematischen Deduktion überall unbedenklich angewendet’ (Zermelo 1904: 516).

27. For a more detailed account of Poincaré's position, and a brief account of the Cauchy proof, see Hallett 2010b: 109–112, and the further references given there.

28. Of course, the proof itself avoids the psychological casting which this explanation has given it.

29. This is shown in Zermelo's proof by the demonstration that there is in fact a one-one correspondence between the ‘remainders’ and the set of singletons determined by M, i.e., {{x} : xM}. Using ‘initial segments’ (and unions instead of intersections) with the inclusion relation, not reverse inclusion, would also work just as well, but taking remainders has the advantage of being able to ‘build down’ from the whole of M instead of ‘up’ from ∅.

30. For an illuminating account of Zermelo's second proof, see Hausdorff 1914: 136–139. For a discussion of Kuratowki's (and other) fundamental work in relation to Zermelo's, see Hallett 2010b: §3.3, or (in more detailed fashion) Hallett 1984: chapter 7, §3(b), 256–266.

31. See Ebbinghaus 2007: 285. The central points were reported in Fraenkel 1922: 232–233.

32. As the paper (Mancosu et al. 2009: 349) points out, Skolem knew of Weyl's work, since he had reviewed Weyl (1910) in 1912.

33. For a much fuller account, see Ebbinghaus 2010. For further subtleties, see Hallett 1984.

34. Before the general acceptance of the von Neumann ordinals in the later 1920s and the consequent ‘reduction’ of the full theory of the ordinals to ‘pure’ set theory, there was a vigorous debate about the independent importance of the transfinite numbers. We cannot go into this here.

35. For a more elaborate account of Kuratowski's results, see Hallett (1984, 2010b).

36. Von Neumann's formulation will be examined in a later article for this Encyclopedia.

37. These lectures can be found in Chapter 2 of Ewald et al. 2013; see also the Introduction to this chapter, §4.2.

38. For an account of the Axiom of Replacement, and in particular von Neumann's work with it, see Kanamori 2012.

Copyright © 2013 by
Michael Hallett <>

Open access to the SEP is made possible by a world-wide funding initiative.
The Encyclopedia Now Needs Your Support
Please Read How You Can Help Keep the Encyclopedia Free