Introduction to Logic and Recursion Theory
This is a transcription of relevant notes from the class 18.511 taught by Prof. Sacks in the Spring of 1998, organized and reinterpreted. Homework problems starting with problem 9 are solved in vitro. Notation is indecipherable.
- Edward Boyden
Propositional Calculus
Propositional calculus is an example of a formal system. One must specify atomic symbols, which consist of letters An, or symbols, and connectives, such as negation !, conjunction &, disjunction |, arrow (implication) à , and parentheses (). The arrow (f à g) should be read as (!f | g). An expression is a finite sequence of atomic symbols. The set of well-formed formulas (WFFs) is defined recursively as follows: (An) is a WFF, and if f, g are WFFs, so are (!f), (f&g), (f|g), (!g), (fà g). This lets up build up new propositions from old ones. They are associative, etc. in the commonly held sense of these notions.
A truth valuation v is a function from the atomic letters into { T, F }. There is a unique extension vc to the set of all WFFs, simply given by defining it recursively in the obvious fashion. Two WFFs are semantically equivalent if each v gives them the same truth value. For all f there exists g such that g is semantically equivalent to f and g uses only !, |, and &. This is called disjunctive normal form. One can go farther and show that the set { !, & } is semantically complete. It is obvious that we cannot discard the ! symbol, but one can combine the two to make the NAND operator, which is, all by itself, semantically complete, the Schaeffer stroke. In quantum logic, the CNOT is semantically complete, combined with an arbitrary unitary operator.
G is satisfiable if there exists a truth valuation which makes all the WFFs in it true. We call a set of WFFs G finitely satisfiable if for every finite subset of G, there is a valuation which makes all the elements true. The compactness theorem states that if G is a set of WFFs that is finitely satisfiable, it is satisfiable. This is provable in several ways. (1st proof) Apply recursion, showing that for each letter Ai, either Ai U G or !Ai U G must be finitely satisfiable. Adjoin Ai or !Ai, so as to get a set that is finitely satisfiable – call this element Ki. Then since any WFF f in G is composed of a finite number of letters, we have that after some level of induction all the normalized letters Ki of f and f itself comprise a satisfiable set. Therefore the valuation which sets all Ki to true will satisfy all the elements of G.
Another proof is as follows: we use Konig’s lemma, which claims that if T is a binary branching tree, such that every branch is finite, the number of nodes is finite. This is easily proven using a counting argument. We may derive a second compactness proof as follows: (2nd proof) Let G be a finitely satisfiable set, and set up a horizontally-growing tree, such that on the ith level, a branch grows up if Ai is true, and it grows down if Ai is false. A branch of length n exists if there is a valuation on the first n letters which satisfies all elements of G with dependencies on just the first n letters. Now, since G is finitely satisfiable then arbitrarily large sets are satisfiable, and so there must be an infinite branch (by contradiction and Konig’s lemma), which gives us a truth valuation for all of G. There are deep connections between this and Tychonoff’s theorem . . .
A tautology is something which is always true, under any truth valuation. |= (!f) implies that f is a contradiction. The interpolation theorem of Craig states that if |= (f à g), and 1) f and g have no letters in common, then either |= (!f) or |= g, and 2) if f, g have letters in common, then there exists a WFF h, built just from the common letters of f and g, such that |= (f à h) and |= (h à g). The proof is hairy, and can be summarized (à la Artin) as "cook until done," or "beat it until it’s dead." Essentially we use induction to build up intermediates which have certain letters set to desired values, and create intermediate tautologies, and apply induction on the number of letters.
First-Order Logic
A first-order language L is specified by its atomic symbols, which take several forms, including logical symbols (propositional connectives such as !, &, |, à , ß à , quantifiers 3 "there exists" and V "for all" (note V can be defined as !3! – that is, V xi f is the same thing as !3 xi !f), variables x0, x1, . . . , and the equality symbol, =), and nonlogical symbols (relations, like < – just symbols with a specified number of ‘slots’, function symbols (which are just like relations), and individual constants like 0,
p, 1, and infinity). Set theory has one nonlogical symbol, inclusion, and is sufficient to give us all of mathematics.A term is defined recursively as follows: every xi and individual constant is a term; if f is an n-place function symbol and t1, . . . tn are terms then f t1 . . . tn is a term. A well-formed formula (or atomic formula) in a first-order language is defined recursively as follows: if r is an n-place relation symbol and t1, . . . tn are terms, then r t1 . . . tn is a formula; if s and t are terms, (s = t) is a WFF; if f and g are well-formed formulas then (f & g), (f | g), (!f), ( f à g), (f ß à g), etc. are all well-formed formulas; if f is a WFF then 3xi f is a WFF (recall that 3 is our definition of the existential quantifier!). An L-structure @ consists of a nonempty set A, the universe (i.e., the integers in number theory) of @; n-place relations on the universe A, for each n-place relation symbol of L; n-place functions f: A à A for each n-place function symbol of L; and a corresponding individual constant of A for each individual constant of L. Think of it as an instance of the language.
Let f be a WFF; an occurrence of x in f is free if it does not lie within the scope of a quantifier on x. If x is not free it is bound. A sentence is a WFF in which no variable occurs freely, and can be assigned a truth value. We write @ |= f, "f is true in @", where f is a sentence of L and @ is an L-structure.
We add one more element to our L-structure to save us pain later on: for each constant term (term containing no variables) t, for example functions of individual constants, we define n(t) to be the element of the universe A named by t. (We already defined the individual constants’ equivalences.) Thus @ |= (s= t) means n(s) = n(t) – i.e., they are the same element of A. This looks recursive, but we appeal to the equality of mathematical discourse for the second equality.
We now define truth after Tarski. @ |= f means f is true in @, where f is a sentence in the language L and @ is an L-structure. We will use our pain-saver in the last step to make things complicated, but it really isn’t so bad: for all a in A, create a new individual constant a’, and let La be the old language plus { a’ | a in A }. This is called the extended language. Then any sentence f of the extended language can be interpreted in @ by interpreting a’ as a. For example, @ |= (s = t) if n(s) = n(t); @ | = r t0 . . . tn-1 if Ra n(t0) . . . n(tn-1); @ |= (f & g) if @ makes f true and @ makes g true; @ |= 3x f if there exists a in A such that @ |= f(a’), where f(a’) means we substituted a’ for each free occurrence of x in f.
A model @ of a set of sentences G is an L-structure that satisfies @ |= f for all f in G. We next prove the first-order logic compactness theorem, which states that if a set of sentences G is such that each finite subset of G has a model, then G has a model. We will have to build a large infrastructure of ultraproducts before beginning to contemplate this complicated sequence of thought. This has many applications, such as Robertson’s principle – if f is a sentence in the first-order language of fields, such that f is true in every field of characteristic zero, then there exists p0 such that f holds in all fields with characteristic p > p0.
Let @i be a sequence of L-structures, and define @inf = prod(@i, i), the infinite product (Cartesian) of the @i’s, which is also an L-structure. We define Ainf = prod(Ai, i), the universe of the product structure; we define R, an n-place relation symbol of L, as Rinf f1 . . . fn, if for all i, R@i f1(i) . . . fn(i) – all definitions are appropriately coordinatewise; we define g, an n-place function symbol of L, as ginf (f1 . . . fn )= fn+1 if for all i, gi( f1(i) . . .fn(i ) ) = fn+1(i); the individual constants cinf of Ainf are just defined such that cinf(i) = c@i. Must we use the axiom of choice here, since we’re picking one element from each set . . . ? Note that orders might become partial orders, and comparabilities might break down, but it’s still a universe and the relations are still relations.
But we don’t want partial orders. We need the ultraproduct.
A base is a set B of subsets of w such that no finite intersection of elements of B is empty. A filter on w is a collection B of subsets of w such that 0 (the null set) is not in B, B is closed under finite intersections, and B has upward closure (i.e, if X is in B and X is in Y, then Y is in B). A principal filter of x0 is the smallest filter containing x0, and is trivial to define. An ultrafilter B is a filter, except that for all subsets x of w, either x or its complement is an element of B. It is easy to see that any base is extendable to a filter, and that any filter can be extended to an ultrafilter (for the latter, use some form of induction on adding sets or their complements). These ideas come from the general theory of Boolean algebra, but they are closely linked to measure theory as well. Thus we define a finitely additive measure as a function u on 2w such that for any subset x of w, u(x) is 0or 1, u(0) =0 and u(w) = 1, amd if u(x) = u(y) = 0, then u( x U y ) = 0. It is easy to see that finitely additive measures and ultrafilters are the same thing.
The ultraproduct UP = prod(@i, i) / D is a universe, where D is an ultrafilter on 2w and @i is a sequence of L-structures, is defined as follows: prod(@i, i) / D is a universe such that if f, g are elements of prod(@i, i), then f ~ g if { i | f(i) = g(i) } is in D – that is, f and g are equal almost everywhere, which is a common horrible thing in mathematical terminology, but which our definition of finitely additive measure makes possible. But (!) ~ is an equivalence relation, so it’s well-defined, and it partitions f into equivalence classes [ f ] such that the universe prod(@i, i) / D = { [ f ] | f in prod(Ai, i) }, where [ f ] = { g | g ~ f }. We define relations, functions and so on in the natural way: RUP [ f1 ] . . . [ fn ] means that R@i f1(i) . . . fn(i) for almost all i, that is except for a set of measure zero, with respect to an ultrafilter; gUP( [ f1 ], . . . [ fn ] )= [ fn+1 ] if { i | g@i( f1(i) , . . . fn(i) ) = fn+1(i) } is in the ultrafilter D. An individual constant cUP = [ f ] if { i | f(i) = c@i } is an element of D. In other words, since D is the set of subsets of w that are measure 1, we may define an ultraproduct universe element as existing if its components exist on a space that is measure 1. (Is this really truth, or is this just a convenience?) For rigorousness you show that that these are consistent defintions – i.e., if f ~ g and { i | R@i f(i) } is in D, then so is { i | R@i g(i) }.
We now prove the the fundamental theorem of ultraproducts, due to Los. It states,
prod(@i, i) / D |= F( [ f1 ], . . . [ fn ] ) ß à { i | @i |= F } is an element of D
that is, being true in the ultraproduct is the same as being true almost everywhere. We apply induction on the complexity of F, by showing that it’s true for relational statements, conjunctions, disjunctions, negations (this is where the "ultra" part of the ultrafilter comes in!!!), and quantifiers. The fundamental theorem effectively states that something is true on the ultraproduct iff it is true almost everywhere.
Finally, we are ready to prove the compactness theorem for first-order logic! Suppose G = {f0, f1, . . .} is our set of sentences. Let I = { b, c, d, . . . } be the set of finite subsets of G, and for all b, choose a structure @b so that @b |= f, for all f in b. Then prod(@b, b in I ) / D is the ultraproduct under consideration, and it is the model which makes all the sentences of G true. We construct a base B as follows: Let Ib = { c | b contained in c }, the set of supersets of b (a subset of I), and let B = union(Ib, b in I). This is a base. Extend to a filter and then an ultrafilter, D. Let { F } be a finite subset of G, and let’s see that prod(@b, b in I) / D |= { F }. But this is the same thing as saying { b | @b |=F } is an element of D. But G is finitely satisfiable, so this true! And so prod(@b, b in I) / D |= { F }, as desired. (!)
For example, consider a sequence of fields @i, and let prod(@i, i) be the set of elements such that the ith element is from the ithe field. Then the product is not a field – i.e., (1, 0, 1, 1, 1, . . . ) is not zero, but neither is it invertible. The trick is to take the quotient with the ultrafilter D which kills all elements of prod(@i, i) with infinitely many zeros, and sets 1/0 to 0 for all elements with finitely many zeros. Thus the zero element becomes equivalent to the 1 element almost everywhere in this ultraproduct formulation.
We can use this to prove Robertson’s principle, stated above. [ Note that this only applies to first-order statements, which can only talk practically about individual elements (see the defintion of first-order language above to make sure that there aren’t any inconsistencies). ] Suppose there is no such p0. Then for all n, there must exist t > n and a field Ft of characteristic t such that in Ft, f is false. This gives us a countably infinite set { Ft | t in I }, so we can form an ultraproduct prod(Ft, t in I) / D. This is a field of characteristic zero (proof later). But Ft |= !f for all t in I, hence in the product, as well. This is a contradiction, since the assumption was that f held in every field of characteristic 0. Justification of saying the ultraproduct has characteristic zero: assume to the contrary that it has some finite characteristic p. Then Ft must have characteristic p for all t. That’s just absurd, since Ft has characteristic t for all t. Only Fp can have characteristic p!
[ Surprisingly there are no references to Robertson’s principle on the web. I wish I could find a more formal proof of all these little ultraproduct things. ] As another example, if you project a variety down a dimension, it is the finite union of varieties or complements of varieties. Throw that at the Nullstellensatz!
We define an ultrafilter to be a principal ultrafilter if it is generated by a point. It’s easy to see that each infinite w has a nonprincipal ultrafilter (just take the cofinite, or finite-complement, subsets, as a base and generate a filter and ultrafilter from the base). If one makes a ultraproduct with a principal ultrafilter generated by { n } , it collapses to the nth element of the ultraproduct. This is weird behavior…
Towards Completeness and Consistency
Now we introduce the fundamental concept of provability. So that we can deal with WFFs and sentences equally well, we define the universal closure of f, a WFF, to be obtained by prefixing f with universal quantifiers so as to bind all free variables of f. Thus a WFF is valid if its universal closure is valid. A sentence is valid if it’s true in every structure.
An axiom (schema) is something we take for granted. Most people would take (f | !f ) to be an axiom; in general it’s just of the form Vx3y P(x,y) where P is a predicate. An axiom schema is valid if the universal closure of every axiom is true in every structure. A rule has premises and a conclusion. For example, modus ponens has premises F, Fà G leading to the conclusion G. A rule is valid if for every structure @, if the universal closures of all the premises are true in @, then the conclusion is true in @. A proof is a sequence of WFFs, f0 . . . fn-1, such that for all i<n, either fi is an axiom or the conclusion of some rule whose premises lie among formulas occurring earlier in the sequence. A WFF f is provable if there exists a proof whose last member is f.
We can easily prove that, if the axiom schemas and rules are valid, then every provable formula is valid. The converse, Godel’s completeness theorem, is nontrivial, and depends on selecting a good set of axioms and rules. First, we prove the fundamental theorem of first-order-logic: let S be a logically consistent set of sentences: then S has a model. Definitions are in order: S is logically consistent if no contradiction can be derived from S using the axioms and rules of first-order logic. A derivation from a set of sentences S is a sequence of WFF, f0 . . . fn-1 such that for all i < n, fi is either a member of S, an axiom, or the conclusion of a rule whose premises lie among f0 . . . fi. Thus provable means derivable from the empty set, and conversely derivable means provable with the added help of appeals to the set S.
This shows that a syntactic formalism (derivation) leads to semantic meaning (models). The fundamental theorem implies compactness of first-order logic (!!) and it also implies Godel’s completeness (!!!): if f is a valid sentence (true in every structure), then f is provable (using the rules and axioms of first-order-logic).
Proof of the fundamental theorem of first-order-logic: usually you sit down and write out a bunch of rules and axioms, and then show that they work. We’ll do it the other way around – try and prove it from what we know, and invent rules and axioms whenever we get stuck. This way we really see where the elements of language are important, and where our additional assumptions come in. We begin with Henkinization, a process used to define a set of supersets, which hopefully will eventually give us completeness.
Define a sequence of sets by recursion on n: Sn (n = 0, 1, 2, . . . ), each an extension of the one before, and Ln, each an extension of the previous language, as follows: Sn+1 is Sn plus all sentences of the following form: ( 3 x F(x) ) à F(cF(x)), where F(x) is a WFF of Ln and cF(x) is a new individual constant, such that wherever x is free, one inserts cF(x). Ln+1 is Ln plus all the constants cF(x), and thus consists of all quantified true statements plus ths stuff that makes them true. Sinf = union(Sn, n) is called the Henkinization of S, and is logically consistent (proof: if Sinf isn’t logically consistent, then some Sn must not be. We invent some rules (contradiction, implication, disjunction, conjunction, quantifiers, and so on) to help us process statements of first-order logic, and find that if Sn+1 |= ( 3 x F(x) ) à F(cF(x)), then Sn |= 3 x F(x) and Sn |= V x !F(x), a contradiction. In this process we applied the Deduction theorem, which says if G is a set of sentences, f a sentence, then (G U { f }) |= h implies G |= ( fà h). In detail: Sn+1 |= contradiction implies Sn U {Henkin sentence} |= contradiction, or Sn |= (Henkin à contradiction), or Sn |= !Henkin, which can be resolved. The case for more than one Henkin sentence is the same.
For example, Linf, the resultant language, is also well-defined. Let f(x ) be a WFF in Linf. Then 3 x F(x) à F(c) belongs to Sinf for some c in Linf. If F is in L0, Sinf |= F, then S |= f – i.e., Sinf is a conservative extension of S. We can extend Sinf to Sinf,inf, which we would like to be logically consistent. For all F in Linf, either F or !F is in Sinf,inf – compare to the ultrafilter.
Back to the proof! Assume that S is countable for now. Then Linf is countable – i.e., f1, f2, f3, . . . a list of all sentences of Linf. We try each sentence in turn, adding either it or its negation. Sinf, 0 = Sinf, and let n>0. Assume Sinf, n is logically consistent. Let Sinf,n+1 = Sinf,0 U { g1, g2, . . . gn+1 }, where gi is either fi or !fi, and we just adjoined gn+1. Then one of these two sets is logically consistent, as one can see by deriving a contradicition in Sinf,n from it. Sinf,inf is union(Sinf,n, n).
Now we can read off the model from Sinf,inf. This is a cook-until-done step. I.e., if r, t are constant terms of Linf, we define r ~ t if (r=t) is a member of Sinf,inf. Then Sinf,inf |= f if f is in Sinf,inf, since everything or its negation is in it. This is an equivalence relation [ r ] on the constant terms r. [ r ] = { t | r ~ t }, R@([ r1 ], . . . [ rn ]) if R(r1, . . . rn) is in Sinf,inf. Then an element of the universe A is just one of these equivalence classes. A = { [ s ] | s in Linf }, and n(s) = [ s ]. (We can prove this using induction on the individual constants’ complexity.)
But why is this a model on S? There’s a lot of junk in Sinf,inf. We do know that F in Sinf,inf means Sinf,inf |= F, and that everything or its negation is in Sinf,inf. Thus we must show that, for all f in Linf, @ |= f ß à f in Sinf,inf – then we will be done. If f is an equation (s=t), s and t constant terms, then we just apply the naming map which sends something to its equivalence class to show that @ |= (s=t) implies (s=t) is in Sinf,inf and that if (s=t) is in Sinf, then n(s) = n(t) in @. The relation between constant terms is handled similarly: R@(t1, . . . tn) implies R@(n(t1), . . . n(tn)), which held only if it was provable in Sinf,inf, and conversely The equation case is similar. Now use induction, on conjunction/disjunction/negation/etc, to complete the proof. Suppose f = (g & h), and @|=f. Then f |=g, f|=h, and these are derivable, so certainly (g & h) is derivable. Negation: if @ |= !g, then by induction, @|=g better not be true. So g is not in Sinf,inf and thus !g is in Sinf,inf. The quantifier: @ |= 3 x f(x): f(t) is in Sinf,inf for t some constant term. This is very simply found from our rules and axioms. The argument is not reversible, and we must use Henkin sentences to get the converse, 3 x f(x) in Sinf,inf à @ |= 3 x f(x). Thus, suppose that 3 x f(x) is in Sinf,inf and so 3 x f(x) à f(cf(x)) is in Sinf,inf – use modus ponens to see f(cf(x) is in Sinf,inf. This is shorter than the statement of interest, so f(cf(x)) in Sinf,inf implies @ |= f(cf(x)), and finally that @ |= 3 x f(x). (It’s interesting to see that ultrafilters correspond to negation/measure, almost exclusively.)
We quickly prove Godel’s completeness. Suppose G|= f; then G|-f. Suppose there is no derivation of f. Then G U {!f} is logically consistent, since no contradiction (i.e., f) can be derived from it. But G U {!f} |- {f} implies G |- !f à f, or G |- f, which gives a contradiction. This is very subtle and important.
Applications of first-order logic: dense linear orders, fields. Model theory is the art of defining and exploring such models. Let @, & be L-structures. We define an equivalence relation @
º &, @ is elementarily equivalent (EE) to &, if for every sentence f of L, @ |= f ß à & |= f. This is also known as first-order equivalence. Isomorphism » is a stronger equivalence: there is a bijection preserving all relations, functions, etc. as well. EE is weaker than isomorphism, as can easily be seen. Complex numbers and algebraic numbers are elementarily equivalent, but they are clearly not isomorphic. (Transcendentality cannot be expressed with a single first-order logic sentence. The sentence ‘every polynomial in x of degree n has at most n distinct roots’ is.) We define an elementary extension of a model @ to be a model & , @ á &, such that @ Í &, and for any formula f(x1, . . . xn), @ |= f(x1, . . . xn) ß à & |= f(x1, . . . xn). Obviously @ á & implies @ º &. We can prove the Upward Skolem Theorem, which says that for a countable language L, infinite L-structure @, and infinite set X (|X| > |@|), then there exists an L-structure & such that @ á &, and |&| = |X|. We prove this using compactness: let S be the set of sentences f(a1, . . . an) such that @ |= f(a1, . . . an) – i.e., all the true sentences in @. Create a new constant ci for each i in X, so that no ci = cj for i,j different. Let K be any finite subset of S. We claim K has a model. This is clear, since @ can serve as a model of K, since K contains a finite subset of constants of S, which can be interpreted as any distinct elements of A. By compactness, S has a model, say &. It is clear |&| > |X| since & contains a set of size X. To show that they are the same requires looking at the fundamental theorem, which mentions the size of the model at the end.For example, we can show that ACF0, the algebraically closed fields of characteristic zero, is a complete theory. Also, since it has decidable axioms and is countable and complete, ACF0 is decidable – there exists a procedure to see if any particular sentence or its opposite is true. A more general theorem, easily proven, is that if T is a theory, and all models of T are elementarily equivalent, then T is complete. This is trivial to see.
Recursion Theory – compare this to Sipser’s Computation, Part II
Clarity is of the essence.
This is the theory of computable functions. It is based on recursion. Many things like Turing machines, etc. have been invented and they are all equivalent, but this is a nice formalism because it links back to the logic of theorems and mathematics. We will define computable, then discard our definition and rely on intuition from now on.
A primitive recursive function is constructed from a set of primitive recursive functions and some recursive rules, defined on the natural numbers N. The three primitive functions are f(x1, . . . xn) = c (constants), f(x1, . . . xn) = xi (projections), and f(x1, . . . xn) = x1 + 1 = x1’ (successor). The rules are composition f(x1, . . . xn) = g( h1(x1, . . . xn), . . . hm(x1, . . .xn) ), and primitive recursion f(0, x2, . . . xn ) = g(x2, . . . xn), f(x1’, x2, . . . xn) = h( f(x1, . . . xn), x1, x2, . . . xn). Any primitive recursive function can be built up this way. For example, f(x,y) = x + y can be constructed as follows: p(y) = y (1-projection), f(0,y) =p(y), f(x’,y) = f(x,y)’ (primitive recursion).
Ackermann’s theorem: There exists a computable function which is not primitive recursive. The proof can be done by explicit construction, or by a diagonal argument. Associated with every primitive recursive function is a derivation from the initial functions and rules. Coding of any derivation and any primitive recursive function can be accomplished, by which we mean we may represent it as a string of digits. For example, you could describe it in English, then code it in ASCII, which then can be represented in binary form. Thus if d is the code number for a derivation, it can be recovered from d, and vice versa. Thus define f(n) = { g(n) + 1, if n is the code number for a derivation g, 0 otherwise. Clearly f(n) is computable, since we can just decode it, look at it, and see if it’s primitive recursive. But is f(n) primitive recursive? If it were, then f(n) has a derivation as a primitive r.f. with code number m. But then f(m) = f(m) + 1. This is shocking, but that is the whole point of computation theory.
To extend our definition we consider a partial recursive function. This defeats the diagonal argument, and gives us a more irrefutable definition of computability. We define the domain and range of a p.r.f. (which now stands for partial recursive function!) to be their natural definitions; f is total if dom f = N, the natural numbers. We wish to define a function to be computable if it is total and partial recursive.
Kleene’s schemes for prfs are just like the regular primitive recursive functions, except that we no longer restrict domains. Thus composition may not always work out correctly, because we may compose two functions that don’t have the right ranges and domains. Thus some computations may fail, or diverge, or be undefined, or enter an infinite loop – we don’t really care what happened, but it can’t be good. If a program returns correctly, it is said to converge. These concepts are sometimes denoted by an up arrow and a down arrow, respectively. There is no way to see that f is total, as we will see later. We define the least number operator
m, which is given by f(x) = m y[g(x,y)=0], which is the least y for which g(x,y) is zero. Since this may not even exist, it is clear that we now explicitly have a problem for which f may not be total. But clearly it is computable, since I can just start computing and stop when I get to a y for which g(x,y) returns zero. This is controversial, but see the next paragraph. This function "has a very partial look to it," to quote Sacks.What if we try a diagonal argument? The statement f(x) =
m y[g(x,y)=0] + 1 is ironically saved by the fact that g(n) may never even return, even if g is computable and total, so the equality as stated cannot be asserted! Does this make the class of computable functions too big? If so, then any proofs of uncomputability must be even stronger than we need! We take now the idea prf = computable. This is equivalent to Turing machines, for reference. Diagonalization is a method for stepping outside the class, to see that there is something beyond the class of interest. But this is now impossible.Consider applying this formalism to sets of numbers:
The characteristic function for a set A CA(x) = (x in A)?1:0. (The characteristic function of a set is simply the answer to the question, ‘is x in the set?’) It is required to make a decision.
A is recursive if CA(x) is a recursive (computable, total partial recursive) function.
A is recursively enumerable (Turing-recognizable) if A is the range (or domain, see below) of a recursive (Turing-decidable) function.
Recursively enumerable can be thought of as being able to find the positive cases, but not the negative ones – i.e., one cannot falsify some inclusion in the set, since the program may not converge. The characteristic function may not be partial recursive (i.e., computable).
Recursive implies falsifiability, which is intuitively a close relative of decidability. This has deep implications for science, perhaps. If something isn’t true then it really isn’t!
The diagonal argument is key.
Fundamental theorem of Recursion theory, also known as the Enumeration theorem: There exists a partial recursive function Y(x,y) such that for any partial recursive function f(x), there exists e such that f(x) = Y(e,x). In other words, prfs make up a one parameter family. First, consider the repercussions of the enumeration theorem: for each f(x), there exists an e such that f(x) = fe(x), which is denoted as the eth prf of x. We define We to be the eth RE set, the domain (or range, at your convenience) of fe(x). The theorem is stating that all such functions can be enumerated, and not only that, all of them can be derived from one function Y(e,x)! We define simultaneous enumeration to be the following process: write down in columns, the different arguments x = 0, 1, 2, . . . and in row, the different function f0, f1, . . . We can visit each point infinitely often through a variety of schemes, and eventually each convergent point will be seen to converge.
Proof of enumeration theorem: We define Y(e,x) to be as follows: if e is the code number of a program for computing a prf of x, then run the program with input x, and if it converges, return Y(e,x) = fe(x). If e is not such a code number, then diverge. We can compute Y, so it must be prf. This is the universal Turing machine. Q.E.D. enumeration theorem.
Taking the second definition of recursively enumerable, and letting We = dom fe, the eth RE set, by the enumeration W0,W1,..., we have generated a list of all RE sets. (The definitions are subtle: A is RE if it’s the domain (or range) of a prf, but it’s recursive only if its characteristic function, which decides whether any particular x is in the set, is a prf.) Furthermore the RE sets can be simultaneously recursively enumerated, as described previously: we can just start up all the programs fe(x) and let them run, and a point x is added to We if fe(x) ever converges. It is interesting to note that we have established a link between syntax and semantics.
Lemma: if A is recursive, A is RE. Proof: we construct a recursive function for which A is the domain: let f(x) be 1, if cA(x) = 1, and let it be undefined (i.e., diverge) otherwise. Thus x in dom f ß à cA(x) = 1, and A is RE.
There exists a nonrecursive recursively enumerable set. Proof: Let K = { e | e is in We }, so that e in K ß à e in We ß à Y(e,e) converges. Then K is RE, since it is recognized by the description just given (i.e., run Y(e,e) and see if it’s accepted). Define We as N – K = cK, the complement of K. If K is recursive then cK, the complement of K, is also recursive, since if a set is recursive then its characteristic function must be recursive, and so we must be able to tell whether a number is in the set or not, so K and cK have equivalent characteristic functions. Suppose e is in We: then e is in K. But then e is not in cK, as required since we defined We to be equal to cK. On the other hand, suppose e isn’t in We: then e isn’t K, since that was the definition for inclusion in K. But then e is in cK, which is equivalent to We, and we have a contradiction of our supposition. The diagonal argument strikes again.
See below for a Turing-machine style argument:
Theorem: if A, cA are RE, then A is recursive. Proof: enumerate A, cA simultaneously. Eventually n is enumerated in A or in cA, (since positivity/recognizing/accepting is guaranteed for RE sets), and thus we have a prf which accepts all elements.
Theorem: suppose A is the range of a partial recursive function. Show A is RE. Proof: we must show that A is the domain of some partial recursive function. Let A be the range of a partial recursive function f(x), and define g(x) as 1 if x is some f(y) – i.e., g(x) = 1 if f(y)-x is zero for some y, or equivalently if g(x) =
m y[(f(y)-x)=0], which is partial recursive because f is partial recursive and the least number operator m is among the Kleene schemes for constructing partial recursive functions.Theorem: Suppose f is recursive (i.e., total partial recursive) and monotonically increasing, Vxf(x)>f(x+1). Then the range of f is recursive. Proof: Let A = range f. Let us construct a Cf(y) that returns 1 if there exists a y such that f(x) = y, and 0 if there is no such y. Since f(x) is total partial recursive, we can run it on all x simultaneously, and all eventually converge. Therefore for any x0, eventually f(x) will converge for all x between 0 and x0, in a finite time. Since f(x)>f(x+1), by induction f(x)>x, and so we need only check f(x) up to x0 = y, to find a definite value for Cf(y). Hence we can decide whether or not y is in the set, in a finite amount of time.
Theorem: Define K = { <e, x> | x in We }. Show that K is RE but not recursive. (This is equivalent to the universal Turing machine, Y(<e,x>) being undecidable.) Proof: <e, x> in K ß à fe(x) converges ß à Y(e,x) converges. Then K is RE, since it is recognized by Y(<e, x>). Define We as N – K = cK, the complement of K, consisting of all numbers which are not expressible as <e,x> | x in We. If K is recursive then cK, the complement of K, is also recursive. Suppose <e,e> is in cK (=We): then <e, e> is in K, since it satisfied the criteria for We. But this is a contradiction! On the other hand, suppose <e,e> is not in cK. Then <e,e> is not in We, and so it is not in K, so it must be in cK. This too is a contradiction.
Key to diagonalization arguments: take the opposite of something, compose it with itself, and test to see if it is like itself.
Church invented the lambda calculus: y = f(x) is ambiguous; it’s much better to write
l x | f(x). (Hence LISP.)We define a set A to be simple if 1) A is RE, 2) cA is infinite (i.e, the complement is infinite), and for 3) all infinite RE sets B, the intersection of B and A is nonempty. Note that 2) and 3) are at odds – 2) seems to say we want a small set, but 3) says we want a really large set. And how do we know that a set is infinite? We will show, two theorems down, that there exists a simple set. This theorem is due to Post. Requirement 3) can be rewritten as Ve(We is infinite à interesection of We and A is nonempty). Post devised an approximation to the definition of infinite, which says that if a number bigger than 2e ever arrived, we would try to get it into A . . .
Theorem: a simple set A cannot be recursive. Proof: If a simple set A were recursive, then cA would be recursive, so cA would be RE, but cA is infinite, so cA must intersect A, which is a contradiction. (The definition was precisely constructed to avoid recursivity.)
Theorem: there exists a simple set. Proof: Let us enumerate the set A in stages labeled by s, such that A<s is the part of A enumerated prior to stage s, and As is the part enumerated in stage s itself. We write, in the lambda calculus,
l s | As, and we compute A0, A1, A2, . . . and we define A to be the union of all the As. Therefore A is RE, since it is the range of a set of lambdas.Enumeration process:
stage s = 0: A0 = {}, the empty set. Define
l s | As recursively on s.stage s>0, define Wes to be that part of We enumerated at the end of stage s of the simultaneous enumeration process, such that
case 1: A<s and Wes do not intersect, and 3x(x>2e & x in Wes). We can tell whether this is true or not, since we can just scan the two sets, both of which are finite, and scan these sets for such an x. In this case, let xs be the least such x, and let As = A<s U {xs}.
case 2: case 1 is false. Then As is just set to A<s.
Let us verify that this is a simple set. 1) Clearly A is RE because we just expressed it in terms of the range of a computation, i.e. the union of enumerated sets. 3) Suppose that We is infinite, and let us show that A and We have nonempty intersection. <??? We know that 3x(x>2e & x in We), so for s large enough 3x(x>2e & x in Wes) – just simultaneously enumerate long enough. i.e., for all s>s0, Wes0 in Wes – fix s>s0 s.t. (s)0 = e. ???> Consider stage s, if A<s Ç Wes isn’t empty, then neither is A Ç We, and we’re done. If A<s Ç Wes is empty, then case 1 holds. But then we pick up an xs in that stage, which is a member of both. 2) We show that cA is infinite. [bloated proof] We show 0 isn’t in A (0 isn’t greater than 2e for any e), and that there is at most one s s.t. (s)0 = e and case I holds (suppose (s)0 = (t)0 = e, s<t, and case I holds at both s, t. Then A<s Ç Wes is empty, and A<t Ç Wet is not empty because t>s and they must have a number in common at every later stage).
Now, each number is placed in A for the sake of some We. But for each e this happens at most once: W0, some x>0 à into A, W1, some x>2 à into A, W2, some x>4 à into A, . . . Suppose n<2e, and n is put into A. It can’t happen after We – it must have happened for some Wd d<e. There are e numbers less than e, and n is one of at most e numbers (< 2e) that got put in. But each contributed at each one number, so at most e-1 got in, and at least e+1 didn’t make it. QED. (kinda uncertain)
Second Fundamental Theorem of Recursion Theory: Kleene’s fixed point theorem (which persuaded him that the general definition of recursion theory is correct): Let g be any recursive function (total prf); then there exists e such that fe = fg(e). Proof: (which "one logician in 100 could get") We claim that there exists a recursive function t such that for all e, ft(e) = ffe(e), where fe(x) = Y(e,x), fe(e) = Y(e,e), ffe(e)=fY(e,e), ffe(e)(x) = fY(e,e)(x) = Y(Y(e,e,), x). ("That’s about as far as we can go with the equations.")
Recall the definition of the Y function – e à code number for prf, and Y(e,x) computes fe(x) if e is a derivation of a prf (and diverges otherwise).
Let t(e) be the code number of a derivation (program) De of a prf h(x) which depends on e as follows: we define h as follows: compute fe(e): if it converges, then h(x) = ffe(e)(x)=Y(fe(e), x). Thus we have invented a function h which depends on f, converted it to a Kleene derivation, and computed its code number t(e).
Consider f ° t. This is a bizarre function, a strange code number computer composed with the function itself. The resulting function of x is f(code number of ffx(x)). Then this must be fd for some d. Thus ft(d) = ffd(d) = ff(t(d)). Let c = t(d). Then fc = ff(c). This is the first result not solved by an appeal to computability.
Corollary: Let f be a recursive function. Then there exists c so that Wf(c) = Wc, e.g. Wc = dom fc = dom ff(c) =Wf(c).
A set A is creative if A is RE and there exists a recursive function g so that for all e, if We is in cA then g(e) is in cA – We. Theorem: If A is creative, then A is not recursive. Proof: If A is recursive, then its complement cA is recursive and therefore RE as well. The contaadiction immediately follows by taking We = cA and applying the definition of creative.
An example of a creative set is D = { e | e is in We }. A quick proof couldn’t hurt: we take g(x) = x. Then for Wx in cD, x in Wx à x in D, so g(x) = x in cD – Wx, with an obvious contradiction following.
A is many-one reducible to B, denoted by A <m B, if there exists a recursive f so that Vx(x in A à f(x) in B). Thus each contains information about the other, and A can be computed from B. It is a partial order, e.g. A <m A, A <m B and B <m C à A <m C. A and B are many-one equivalent, A º m B, if A is many-one reducible to B and B is many-one reducible to A. The many-one degree of A is { B | B <m A } – almost like a partial equivalence class for A. A is many-one complete if (i) A is RE, and VB[B RE à B is many-one reducible to A]. Note if A is reducible to B and B is not recursive, then A cannot be recursive.
Post found many RE sets by listing a bunch of axioms, proofs, and code numbers. Sometimes the RE set is recursive – i.e., for most simple systems. Otherwise, one could always apply Godel’s argument. But one couldn’t always show that it was complete… It was suspected that all ‘naturally occurring’ sets were recursive or many-one complete. That there would be any complete sets at all was a mysterious fact!!
Myhill’s Theorem: If A is creative then A is many-one complete. Proof: Let B be any RE set, and show that B <m A, taking A to be creative. Since A is creative, there is a recursive f such that Vx(Wx in cA à f(x) in cA – Wx). We assert that there exists a recursive function h so that the domain of h is Wh(d,n) = { f(d) }, n in B, and the null set, n not in B. [ Recall, Y(e,x) is the universal program. Every prf and trf is on the list. Also, an English description of the function can be found too (just as we said above), and can be translated into Kleene schemes. Thus the code number of the RE set Wh(d, n) is a computable function of d, n. We may think of h(d,n) as a recursive function of n alone, a one-parameter function of n. ]
Then
l d | h(d,n) has a fixed point c(n), which depends on the parameter n. We claim c(n) is a recursive function of n. To see this one must return to the fixed point theorem proof, and confirm that in general the fixed point is computable (i.e., the fixed point of f was t(e) where fe = f ° t, and so the fixed point is computable fro ma code number for f. Thus a code number for l n | h(d,n) is computable from d, as desired.).Thus Wc= Wf(c), so Wh(c(n), n) = { f(c(n)) } if n in B, null otherwise. A is creative, so Wc(n) in cA à f(c(n)) in cA – Wc(n). Thuse we have two cases:
suppose n in B: then Wn = { f(n) }, so f(n) is not in cA – Wd, and so Wn is not in cA, so this one element f(n) must be an element of A. I.e., n in B à f(n) in A. (!!!)
suppose n is not in B. Then Wn is the null set, and Wn is in cA (trivially). Thus f(n) is in cA, and trivially f(n) is in cA – Wn. Thus n not in B à f(n) not in A. Q.E.D.
The basic point of the proof: we obtain a fixed point of
l d | h(d,n) which is defined differently if n is in B or not in B. We show it’s recursive, and apply the definition of creative, and lo and behold a miracle occurs…A <1 B, A is 1-1 reducible to B, if there is a one-to-one recursive function f such that Vx(x in A ß à f(x) in B). (Clearly it is too a partial order, and there is an equivalence relation, 1-1 equivalence, that can be defined such that A º 1 B if both A <1 B and B <1 A.) A is one-one complete if A is RE and VB(B RE à B <1 A). Myhill’s Theorem was originally that A creative à A one-one complete. I.e., in the previous version, the reducibility function was f ° c, and this time around it needs to be one-to-one. This is just a detail, says Sacks. p is a recursive permutation of N if p is recursive and p maps the natural numbers to themselves in a one-to-one, onto fashion. (It’s just a bijection of N with itself, an infinite permuatation, a rearrangement…) We also require that p-1 be computable. But to compute p-1(x), one can just compute p(0), p(1),… and stop when you find a function value that takes the value x.
So what Myhill actually stated, round 2, was that if A and B are one-one equivalent, A º 1 B, then there exists a recursive permutation p of N so that p(A) equals B, i.e., A is recursively isomorphic to B. Hence creative à one-one complete à reducible to each other à recursive permutation. Basically, there’s only one creative set!
Theorem (Myhill): If A º 1 B, then A is recursively isomorphic to B. Proof: Suppose f, g are 1-1 recursive functions associated with the hypotheses Vx(x in A ß à f(x) in B), Vx(x in B ß à g(x) in A). Then we can construct, in a diagrammatic way, a 1-1 function using the functions f and g as follows: draw two lines N1, N2:
Draw a line from 0 in N1 to f(0) in N2, then go to the first element not used in N2 (0, in our case) and apply g to it to get an element of N1, etc. If you ever land on an already occupied space, just keep going back and forth applying f, g, f, g, . . . until you hit an element which hasn’t been used. This establishes a computable one-to-one, onto function between A and B . . . except that p(A) might not be all of B. Does our bouncing around method really map all of A to all of B? How should we modify f ° c from Myhill’s theorem (above) to solve the problem here? NOTATION CHANGE.
Let A be creative; there exists a recursive function c so that Ve(We in cA à c(e) in cA-We). We want a 1-1 recursive c* so that Ve(We in cAà c*(e) in cA-We) – i.e., c* is a recursive 1-1 creative function. Define a recursive function d(s,e) as follows: d(0,e) = c(e), and Ks,e = the union of We and { d(t,e) | t<s }. There exists a recursive g so that Wg(s,e) = Ks,e. (Just enumerate; translate into a derivation; get another code number.) Then d(s,e) = c(g(s,e)). [Here the reader will cheerfully admit that his true calling is not logic, and go home.]
Assume d(t,e) in cA for all t<s, and show that d(s+1,e) is in cA. Ks,e in cA, Wg(s,e) in cA, so d(s+1, e) = c(g(s,e)) in cA-Wg(s,e), just by creativity of A. Therefore, if We is in cA, Vs(d(s,e) in cA), and also Vs(d(s+1,e) is not in {d(0,e),...d(s,e)}. Now, t<s à d(t,e) in Wg(s,e) à d(s+1, e) is not in {d(0,e),…d(s,e)}. The picture is
Define c*(0) = c(0), t =
ms[d(s,e+1) not in {c*(0),…c*(e)}], and c*(e+1) = [d(t,e+1)]. Clearly 1-1 by construction, and is creative because of our definition of d.As for the fixed-point function, we must create neoplasms. e.g., if f is recursive, there exists e so that We =Wf(e) and for all e, there exists e0<e1<…, We=Wei, where basically one adds NOPs to all the other derivations – just lengthen the program, without actually changing what is computed. All of these functions enumerate the same set. [Recall the proof of the fixed point theorem: fd = f o t, with fixed point e = t(d) – other fixed points exist, e.g. any neoplasm for d !!! Thus fd =fdi for a lot of i’s, with corresponding fixed points ei = t(di). There are infinitely many fixed points!
h(e,n) =
l e | h(e,n) had a fixed point d(n), recall. Myhill used many-one recursive functions to get many-one reducibility; e.g. we must now transform c(d(n)) into c*(d*(n)), where c* and d* are 1-1 functions. We choose d*(0) = d(0), then define d*(n+1) to be a fixed point different from d*(0),d*(1),…d*(n), using neoplasms. Q.E.D.The fixed point theorem, some say, is the heart of computability theory. It is a very strong way to form circularity while making it reasonable. Nothing related to it is intuitive, however.
Back to intuitive computability theory… we suggest a new theorem, called the Reduction Theorem. Let A,B be recursively enumerable sets. Then there exists recursive disjoint A0, B0 so that A0 is in A, B0 is in B, and the union of A0 and B0 is equal to the union of A and B. The proof is extraordinarily simple: at stage s, simultaneously enumerate. Suppose you are enumerating the element n.
If it appears in A, then if it hasn’t appeared in B yet, just put it in A0. Else, throw it away.
If it appears in B, then if it hasn’t appeared in A yet, just put it in B0. Else, throw it away.
That the new sets are recursive is obvious. (Compare to the ‘complement’ method that we used a few pages back.)
A set is co-RE if it is the complement of a RE set. (This is a natural definition that you must have been anticipating for some time now.) Suppose V, W are disjoint co-RE sets; then there exists a recursive set A so that V is wholly contained in A, and W is wholly contained in cA. This is called a separation (compare Hausdorff with points as co-RE sets and open sets as recursive sets). [Problem 14] Proof: By the Reduction Theorem, since the sets cV, cW are RE, we can define recursive disjoint sets A0, B0 within cV, cW so that 1) A0 Ì cV, 2) B0 Ì cW, 3) A0 È B0 = cV È cW. 1) à V Ì cA0, and 2) à W Ì cB0. The claim follows immediately. Q.E.D. Separation does not hold for RE sets, else RE sets would be recursive (this is obvious). The proof is interesting, historically:
w-consistency was what Godel used for his original incompleteness proof. The assumption is "There is no formula P(n) so that all of the following are provable: 3nP(n), !P(0), !P(1), . . ."— i.e., an infinite set of assumptions. This was corrected by Rosser (Sacks’ advisor) in 1935; his argument relied on the nonseparation of RE sets. A = {2c3d , 2c3d appears in Wd before it does in Wc }, B = {2c3d | 2c3d appears in Wd before it does in Wc }. A and B are then disjoint RE sets. Assume D (=Wc), cD (=Wd) are RE. Then 2c3d in D or 2c3d in cD: the first implies 2c3d in Wc, which implies 2c3d in B, a contradiction. The second implies 2c3d in Wd, so 2c3d in A, a contradiction. (??) [This proof is screwed up. A little work could save it.]Post’s Problem.
This proof is interesting historically and philosophically. It uses a priority argument, also known as an injury argument. Some definitions are in order: a RE set B is complete (or Turing complete) if for all RE sets D, D <T B. The symbol <T means Turing-reducible: A <T B if A is recursive in B, or their characteristic functions satisfy cA <T cB; that is, there exists a derivation of cA in {Kleene schemes + elements of B}. The elements of B are oracles; we just assume that they always converge (and in complexity theory, that they take unit time to execute). In other words, we get B for free; we use the set of all true facts about B in order to facilitate the derivation/execution of A. For functions f, g, f <T g means that f has a derivation (coding) in terms of just Kleene schemes and the function g. Clearly <1 implies <m implies <T; therefore this is the broadest category of uncomputable stuff that we will concern ourselves with.
Some observations.
If A is recursive then A <T X for all X, automatically. (Indeed one doesn’t even need X in the derivation of A). A <T A, obviously. And A <T B, B <T C implies A <T C – one can just call the programs all recursively. The Turing degree of A is denoted by all B such that B º T A, where A º T B iff A <T B and B <T A. The degree of A is A. We say A < B if A <T B; since the Turing degree is an equivalence class this is a good definition. The Turing degrees of the RE sets are denoted by 0, the empty set degree (i.e., the degree consisting of all recursive sets, since they can be computed using no oracles), 0’, the degree of the Universal Turing machine {<x,e> | x in We} (not every representative of which is RE; a set and its complement have equal Turing degree, but the complement of an RE set is not RE unless both sets are recursive), and, well…
Post’s problem asks if there are any other Turing degrees. Certainly the UTM can answer any question about any RE set; this is sort of the ‘most unsolvable’ possible Turing degree. Might there be ‘intermediate degrees’ which aren’t so unsolvable?
YES.
First, let B be a nonrecursive RE set. Then there is a simple set S º T B. Proof: Let B be the range of f, f a recursive 1-1 function. Let Df be the deficiency set of f, the set { n | exists t such that t>n, f(t)<f(n) } – i.e., the degree to which the set generated by f is ‘out of order.’ Then Df is RE. Start enumerating B, Df simultaneously… We claim that Df is simple. We show that 1) cDf is infinite and that 2) every RE set intersects cDf.
Finally, we must show that Df has the same degree as B. We show 1) Df <T B, 2) B <T Df.
All that remains is Post’s problem and a few examples from forcing and the theory of predicates and genericity and Godel’s theorem.