Homework 5

Parsing

This homework is written in literate Haskell; you can download the raw source to fill in yourself. You’re welcome to submit literate Haskell yourself, or to start fresh in a new file, literate or not.

Please submit homeworks via the DCI submission page.

There is a lot of coding in this assignment. Good luck! You can download working lexers and parsers for the pure lambda calculus to help you with problems 2 and 3. The starter includes two lexers—one in alex and one by hand—and two parsers—one in happy and one by hand, by recursive descent. (To install these tools, run cabal install alex and cabal install happy on your command line. They should already be installed on the lab machines.)

You should submit only one solution. You can mix and match: write the lexer however you want, then write the parser however you want. I find it most straightforward to lex manually and parse using happy, but play around and decide for yourself. Get started early.

You can test the code in the starter by unzipping it and running make in the hw05_lc directory. This should create an executable named Main, which can be run on any of the .lc files. Don’t worry if you don’t understand the code in Main yet—we’ll get to it soon.

You will need to submit all of your code, whether all in a single file (viable only if you write everything manually) or many files zipped up in one. If you submit more than one file, make it absolutely clear to the TAs which function runs your lexer and which runs your parser.

If we can’t run your code, we can’t grade it.

Problem 1: parse trees

Please do problem 4.2 from Mitchell, page 83.

Example 4.2 specifies that multiplication and division have higher precedence than addition and subtraction, and that operators of the same precedence are left associative, e.g., 6 - 2 - 1 is interpreted as being equivalent to (6 - 2) - 1 and not 6 - (2 - 1).

Problem 2: lexing the Expr language

import Data.Char

Write a lexer for the Expr language.

type Id = String

data Expr =
    EVar Id
  | ETrue
  | EFalse
  | EIf Expr Expr Expr
  | ENum Int
  | EIncr Expr
  | EDecr Expr
  | EIsZero Expr
  | EApp Expr Expr
  | ELam Id Expr
  deriving (Show, Eq)

Here are some sample programs in the Expr language’s syntax:

incr 5
if isZero x then true else f (decr x)
\s -> \z -> s (s z)
(\increment -> increment 5) (\x -> incr x)

Here are some invalid samples, which should fail in the lexer:

?**&
\x -> (!)

What kinds of tokens should you use? How do you make sure that keywords override identifiers—i.e., \incr -> incr isn’t valid because incr clashes with the built-in incr operation—but that identifiers can harmelessly include keywords, like increment above?

To help you get started, you can download a lexer and parse for the pure lambda calculus. Note that the syntaxes are already slightly different—that parser uses a . in lambdas, but we’re going to use ->.

You have a choice for how to implement your lexer. You can either use alex, the automatic lexer generator, or you can write it by hand. There are pros and cons either way: alex gets keyword/identifier overriding correct automatically, but it’s a new tool to learn. Both versions are in Lexer.x—you only need to do one.

Problem 3: parsing the Expr language

Once you have a lexer, you must write a parser for the Expr language. If you haven’t by this point, you should consider downloading working lexers and parsers for the pure lambda calculus. Here are some more sample programs demonstrating the syntax I want you to use for the Expr language, which is slightly different from that in the starter:

A file containing:

if isZero 0 then incr 1 else decr 5

should parse to EIf (EIsZero (ENum 0)) (EIncr (ENum 1)) (EDecr (Enum 5)). If you evaluated this code—which is different from parsing it!—it should return 2.

A file containing:

(\x -> true) false

should parse to EApp (ELam "x" ETrue) EFalse.

A file containing:

a b true

should parse to EApp (EApp (EVar "a") (EVar "b")) ETrue.

Here are some invalid samples, which should fail in the parser:

incr 1 2
\x ->
\1 -> 1

Problem 4: extending the lexer and parser

We’re going to add let and let rec to our language by merely extending the parser—Expr won’t change at all. That is, let and let rec will be syntactic sugar, clever encodings done in the parser.

I recommend you do this problem in two steps: first get let working, then get let rec working.

First, you’ll need to add tokens to support let syntax. The syntax of let is let id = expr in expr. What tokens do you need add?

Once you’ve added the appropriate tokens, you need to get the parser to encode the let in Expr. Suppose you have let x = e1 in e2. You can encode this in the lambda calculus as (λx. e2) e1… how can you translate that to Expr?

Once you’ve gotten let working, you should work on let rec. The syntax for let rec is let rec id = expr in expr. Let rec should allow recursion, i.e., in let rec x = e1 in e2, the expression e1 should be allowed to recursively reference x.

We can encode recursive definitions using the y combinator. When a programmer writes let rec f = e1 in e2, you can encode it as (λf. e2) (y (λf. e1)). What tokens do you need to add? Why is that encoding correct?

Good luck, and happy hacking!

Final “please submit the right things” plea

If you write one of your lexer or parser by hand, you only need to turn in one file:

  • both lexer and parser in one,
  • Lexer.x (for your alex lexer) with your parser included in the bottom, or
  • Parser.y (for your happy parser) with your lexer included in the bottom.

If you use both alex and happy, you’ll need to turn in a zipfile containing Lexer.x and Parser.y.

You don’t need to turn in the Main.hs driver or Makefile we give you, though it’s fine if you do. We’ll ignore them.

Remember: if we can’t run your code, we can’t grade it.