Lecture 20 (2018-03-29)
Lambda calculus: recursion and the Y combinator

The lambda calculus doesn't have recursion built in, but we can do it
anyway.

<h3>Recursion: finite prefixes</h3>

Consider a recursive version of `plus` on Church numerals:

```pre
plus = λm n. (isZero m) n (plus (pred m) (succ n))
```

Now, such a definition isn't mathematically valid---we've defined a
lambda calculus expression in terms of itself, which isn't finitely
solvable. But what about this:

```pre
plusF = λplusRec m n. (isZero m) n (plusRec (pred m) (succ n))
```

Now, `plusF` is a perfectly valid definition, and no matter what we
give it as the first argument, it'll give the right answer when `m` is
zero:

```
plusF Ω zero n =β (isZero zero) n (Ω (pred zero) (succ n))
               =β true n (Ω (pred m) (succ n))
               =β n
```

Now observe that if we give it an argument that will do another
recursive step, we work on more inputs:

```
plusF (plusF Ω) one n =β (isZero zero) n ((plusF Ω) (pred one) (succ n))
                      =β false n ((plusF Ω) (pred one) (succ n))
                      =β ((plusF Ω) (pred one) (succ n))
                      =β ((plusF Ω) zero (succ n))
                      =β (isZero zero) (succ n) (Ω (pred zero) (succ (succ n)))
                      =β true (succ n) (Ω (pred zero) (succ (succ n)))
                      =β (succ n)
```

If we give it `plusF (plusF (plusF Ω))`, then we'll be good up to
three, and so on. So: if we could only have an *infinite number* of
`plusF` calls available, then we'd be able to work on any input.

What we want is technically known as a fixpoint: we want a function
`Y` such that:

```
Y e = e (Y e)
    = e (e (Y e))
    = ...
    = e (... arbitrarily many times ... (Y e))
```

<h3>Recursion: infinite prefixes</h3>

Consider the term `ω = λx. x x.` What does ω do when applied to
itself?  It reduces right away to itself! This is kind of like running
forever---`Ω = (ω ω)` will happily churn away, looping on its own
forever.

We can use a behavior like this get recursion in the lambda caluclus,
by using the *paradoxical Y combinator*.

```pre
Y = λf. (λx. f (x x)) (λx. f (x x))
```

We have, for all expressions e:

```pre
Y e =β (λx. e (x x)) (λx. e (x x))
    =β e (λx. e (x x)) (λx. e (x x))
    =  e (Y e) = e (e (Y e)) = ...
```

Whoa. How do we use this? The intuiton is that any terminating
recursive function will call itself some finite prefix of
times. Let's try it on `plusF`:

```
plus = Y plusF

plus three one =β (isZero three) one ((Y plusF) (pred three) (succ one))
               =β false one ((Y plusF) (pred three) (succ one))
               =β (Y plusF) (pred three) (succ one)
               =β plusF (Y plusF) (pred three) (succ one)
               =β plusF (Y plusF) two two
               =β (isZero two) two ((Y plusF) (pred two) (succ two))
               =β false two ((Y plusF) (pred two) (succ two))
               =β plusF (Y plusF) (pred two) (succ two)
               =β plusF (Y plusF) one three
               =β (isZero one) one ((Y plusF) (pred one) (succ three))
               =β false one ((Y plusF) (pred one) (succ three)
               =β (Y plusF) (pred one) (succ three)
               =β plusf (Y plusF) (pred one) (succ three)
               =β plusf (Y plusF) zero four
               =β (isZero zero) four ((Y plusF) (pred zero) (succ four))
               =β true four ((Y plusF) (pred zero) (succ four))
               =β four
```

Try to do this derivation on your own, without consulting these notes.

<h3>The call-by-value Y combinator</h3>

You may have noticed that we've used the equational theory rather than
an evaluation function or a stepping relation. The equational theory
is the ground truth of the lambda calculus and the easiest way for
humans to reason about it. But your interpreters for
[HW06](../hw/Hw06.html) run using call-by-value (CBV) semantics. Is
that a problem?

Take a closer look at the the derivation above. Notice that I chose
*specific* beta reductions to make. If I wanted to, I could have derived:

```
plus three one =β (Y plusF) three one
               =β plusF (Y plusF) three one
               =β plusF (plusF (Y plusF)) three one
               =β plusF (plusF( plusF (Y plusF)))) three one
               =β plusF (... however many times I want (Y plusF)) three one
```

What's going on? Here, I've simply chosen to apply β reduction in the
first part before evaluating arguments. Incidentally, this is what CBV
evaluation does.

Should we be worried? Yes and no. Algebra has tons of equations we can
use, and sometimes using them doesn't lead anywhere we ant to go. For
example, we could an algebraic proof that `n + 2 = (n + 1) + 1` might
proceed along the lines of:

```
  (n + 1) + 1
=  n + (1 + 1)   (associativity of +)
=  n + 2         (definition of +)
```

But not every series of algebraic manipulations gets us somewhere
worthwhile; I could just as easily have written an infinite series of
irrelevant manipulations:

```
  (n + 1) + 1
= (1 + n) + 1    (commutativity of +)
= (n + 1) + 1    (commutativity of +)
= (1 + n) + 1    (commutativity of +)
= (n + 1) + 1    (commutativity of +)
= ...
```

The *existence* of this second set of equalities doesn't change our
earlier, more meaningful example.

So much for algebraic proof: how do we reconcile the equational theory
and CBV evaluation? `Y` doesn't behave right in CBV evaluation---your
program just runs forever. The trick is to make sure that we don't
automatically evaluate the fixpoint---we'll only unroll `Y e` as
demanded by the program. We want a fixpoint that behaves like:

```
Y e = e (λx. (Y e) x)
```

CBV evaluation stops here, because the `Y e` is hidden under a
lambda. So the call-by-value Y combinator can be defined as:

```
Y = λf. (λx y. f (x x) y) (λx y. f (x x) y);
```

For more information, the [Wikipedia article on the Y
combinator/fixpoint
combinators](https://en.wikipedia.org/wiki/Fixed-point_combinator#Fixed_point_combinators_in_lambda_calculus)
is excellent. (They use "strict" meaning, for our purposes,
call-by-value. What I've given you is a variant of the Z combinator.)

<h3>Another way to use Y combinator</h3>

It might be hard to figure out how to use Y. Here's a step-by-step
recipe for how to go from Haskell code to code using the Y combinator.

> plus 0 n = n
> plus m n = plus (m-1) (n+1)

First step: eliminate pattern matching, since the lambda calculus
doesn't have that. Let's rewrite this to use an if statement.

> plus m n = if m == 0 then n else plus (m-1) (n+1)

Second step: write explicit lambdas.

> plus = \m n -> if m == 0 then n else plus (m-1) (n+1)

Third step: use Church encondings.

> plus = \m n -> (isZero m) n (plus (pred m) (succ n))

Fourth step: eliminate explicit recursion using Y. To do this, we come
up with a new name---here, `plusF`---and add it as a new parameter to
our function. We'll use `plusF` to do a recursive call, and pass our
whole function to Y.

> plus = Y \plusF m n -> (isZero m) n (plusF (pred m) (succ n))

Fifth and final step: translate to lambda calculus syntax! This
amounts to changing arrows to dots and giving the slashes little legs
to make them lambdas.

```
plus = Y (λplusF m n. (isZero m) n (plusF (pred m) (succ n)))
```

<h3>CBV equivalents</h3>

The above recipe isn't *exactly* right. `isZero m` will return a
boolean which will evaluate both of its arguments... one of which is a
recursive call!

The solution is to 'delay' a little bit more. We can write the delayed
Church booleans as:

```
true = lambda a b. a (lambda x. x)
false = lambda a b. b (lambda x. x)
```

The expectation here is that each choice (`a` or `b`) takes a single
argument, which it ignores. By putting each choice under a lambda, we
can delay evaluation until the choice is made.

Try to figure out how to write `and`, `or`, and `not` on your own. But
we can write our conditional:

```
plus = Y (λplusF m n. (isZero m) (lambda x. n) (lambda x. (plusF (pred m) (succ n))))
```

Such functions that ignore their arguments are called "thunks".