CS334 PROGRAMMING LANGUAGES

Assignment 3 Due 3/11/97

Suppose you are given the following grammar for simple English sentences:
```
	<sentence> -> <noun-phrase> <verb-phrase> '.'
	<verb-phrase> ->  <verb> <noun-phrase>

	<verb> -> sees | likes | grabs

	<noun-phrase>  -> <article> <noun>

	<article>  ->  a | the

	<noun> -> girl | dog
```
a. How would you modify the grammar to introduce a new start symbol <paragraph> that will allow you to generate any number of sentences?
b. Suppose you wish to be able to generate sentences with all possible pronouns as subjects and objects (e.g., you, he, she, it, they). What problems do you encounter in generating grammatically correct English? How can you solve these problems? Provide a modification of the grammar above which does reasonably well at supporting pronouns.
Two famous phrases used as examples in linguistics are "Time flies like an arrow" and "Fruit flies like a banana." (I believe they are due to Noam Chomsky). Please generate plausible syntax rules for English that would allow you to parse both of these sentences. The point of the examples is to indicate the difficulty in parsing (and understanding) natural languages. Please explain this difficulty.
Please do problem 8 on page 93 of Louden. (You might want to look at the solution to the similar problem 9 in the back of the text for a hint.)
Suppose we have a language which uses dynamic scope. Ignoring pointers, is it the case that the scope of a variable is the same as its lifetime? Why or why not?
Please do problem 9 on page 146 of Louden.
In this problem, I want you to begin to design an ML interpreter for a simple functional language. Our language is relatively simple, but more sophisticated than the arithmetic expressions of last week since it involves functions. The expressions are written in the language given by the following simple BNF grammar.
```
   e ::= x | n | true | false | succ | pred | iszero | 
         if e then e else e | (fn x => e) | (e e) | rec x => e
```
In the above, "x" is a variable, "n" stands for an integer, "true" and "false" are the truth values, "succ" and "pred" are unary functions which either add or subtract 1 from its arguments, "iszero" is a unary function which returns "true" if its argument is 0 and "false" otherwise, "if...else..." is a conditional expression, "fn x => e" is a function with formal parameter "x" and body "e", and "(e e)" represents function application. (Don't worry about "rec x => e" for now! It is used for defining recursive functions.)

As in last week's assignment, we will presume that we have a parser which parses input into an abstract syntax tree, which your interpreter should use. The definition of the ML datatype is
```
   datatype term = 
        AST_ID of string | AST_NUM of int | AST_BOOL of bool
      | AST_SUCC | AST_PRED | AST_ISZERO
      | AST_IF of (term * term * term) | AST_ERROR
      | AST_FUN of (string * term) | AST_APP of (term * term)
      | AST_REC of (string *term) 
```
As before this definition mirrors the BNF grammar given above; for instance, the constructor AST_ID makes a string into an identifier or variable, and the constructor AST_FUN makes a string representing the formal parameter and a term representing the body of the function into a function. Interpreting abstract syntax trees is much easier than trying to interpret terms directly.
You are to write an ML function interp that takes an abstract syntax tree representing a term and returns the result of evaluating it, which will also be an abstract syntax tree. The reduction should be done according to the rules given below. The expression "e => v" means that the term "e" evaluates to "v" (and then can be evaluated no further). The rules below are written for the expressions in the original grammar. Your program should be written for the equivalent expressions using the abstract syntax trees (elements of type "term").
The base cases are:
(1) n => n for n an integer.
(2) true => true, and similarly for false
(3) error => error
(4) succ => succ, and similarly for the other initial functions
The other cases are slightly more complicated. They are written in the form of a rule in the manner of the following example:
```
         b => true         e1 => v
    (5)	---------------------------
         if b then e1 else e2 => v
```
We read the rule from the bottom up: if the expression is an if-then-else with components b, e1, and e2, and b evaluates to true and e1 returns v, then the entire expression returns v. Of course, we also have the symmetric rule
```
         b => false        e2 => v
    (6) ----------------------------
         if b then e1 else e2 => v
```
The following are some of the cases for applications:
```
         e1 => succ        e2 => n
    (7) ----------------------------
             (e1 e2) => (n+1)

         e1 => pred        e2 => 0       e1 =>pred    e2 => (n+1)
    (8) ---------------------------     --------------------------
             (e1 e2) => 0                      (e1 e2) => n
	
         e1 => iszero   e2 => 0          e1 =>iszero   e2 => (n+1)
    (9)	------------------------        ---------------------------
            (e1 e2) => true                    (e1 e2) => false
```
Here is a simple example using these rules: Evaluate (if (iszero 0) then 1 else 2)
According to rules 5 and 6, we must first evaluate (iszero 0). By rule (9), this evaluates to true. Now by rule (5) (and the fact that 1 => 1 via rule 1), this evaluates to 1.
a. Use these rules to write an interpreter, interp: term -> term, for the subset of the language which does not include terms of the form AST_ID, AST_FUN, or AST_REC. If your interpreter tries to evaluate these three types of expressions, it should return the error, AST_ERROR.
Note: In my directory, ~kim/cs334stuff/ML.interps, you will find a file, parser.sml, which is an ML program which parses files containing an expression from the simple BNF grammar given above into an expression using the AST terms. (ie: If a file "foo" contains succ 7, parsefile foo returns AST_APP(AST_SUCC, AST_NUM 7), an expression in the proper form for use by your interpreter.) Feel free to use this method to generate abstract syntax trees, which is much easier than typing in the long AST terms directly. You will find in the same directory the skeleton of a program called "PCF.interp.student.sml", which also contains brief explanations and examples.
b. The notation e[x := v] indicates the textual substitution of v for all free occurrences of x in e. For example, (succ x) [x:=1] is the expression (succ 1). Please write an ML function subst that takes a term, t, a string representing a variable, v, and a term, s, and returns t with all free occurrences of v (actually AST_ID v) replaced by s. Thus, the function application (corresponding to (succ x) [x:=1], above),
```
        subst (AST_APP(AST_SUCC, AST_ID "x")) "x" (AST_NUM 1)
```
gives the answer AST_APP(AST_SUCC, AST_NUM 1).
Do not substitute in for bound occurrences of variables. I.e., substituting 3 for x in (x + ((fn x => 2+x) 8)) should result in (3 + ((fn x => 2+x) 8)). The formal parameter x, and its occurrences in the body of the function are not affected by the substitution because of the static scoping rules. (Hint: use pattern-matching on each constructor of the abstract syntax tree, calling subst recursively when you need to.)
c. Using your substitution function, extend your interp function from part a to include AST_FUN terms. The reduction for the terms involving AST_FUN should be done according to the rules given below:
Functions by themselves don't do anything (just like succ and pred above)
```
    (10)	(fn x => e) => (fn x => e)	  
```
Computations occur when you apply these functions to arguments. The next rule defines call-by-value function application, as in ML. If the function is of the form fn x => e, evaluate the operand to a value, v1, substitute v1 in for the formal parameter in e, and then evaluate the modified body:
```
            e1 => (fn x => e3)      e2 => v1    e3[x:=v1] => v
    (11)    --------------------------------------------------------
                              (e1 e2) => v
```
For instance, in evaluating the application
```
		((fn x => (succ x)) (succ 0))
```
we first note that the functions is already full evaluated, so we evaluate (succ 0) to 1, and then plug this in for x in the body, (succ x), of the function, obtaining (succ 1), which evaluates to 2.
Notice that while terms of the form (AST_VAR s) can appear whenever s is a formal parameter, we never need to evaluate terms of the form (AST_VAR s), because they are always replaced by the subst function before we evaluate the body of the function.
We have not yet provided a reduction rule for AST_REC terms. We will do that for the next homework. For now, just return AST_ERROR if your interpreter is applied to terms of the form AST_VAR s or AST_REC(s,e).