CS334 Assignment 4

CS334 PROGRAMMING LANGUAGES
Assignment 4
Due 3/18/97

We've seen, both in ML and in multi-dimensional arrays in Pascal, that functions of types S -> (T -> U) and (S * T) -> U are almost interchangeable, in the sense that any function of one type can be easily rewritten as a function of the other type.
Show that the ML types S -> (T -> U) and (S * T) -> U are essentially "the same" by defining higher-order ML functions
```
        Curry:  ((S * T) -> U) -> (S-> (T -> U)) and 
        UnCurry:  (S -> (T -> U))-> ((S * T) -> U)
```
such that for all f : (S * T) -> U and g: S -> (T -> U), (i) UnCurry (Curry (f)) = f, and (ii) Curry (UnCurry (g)) = g. This shows a one-to-one correspondence between the two types.
That is, you must write ML functions Curry and UnCurry with types as above and then "prove" that the two equations above always hold for your functions and any f and g of the appropriate types. The "proof" is most easily done by applying both the left-hand and right-hand sides of the equation to a general term of the appropriate type (e.g., for (i) apply it to a pair (s,t): S*T) and showing that they both give the same answer. Thus, to prove (i) show UnCurry(Curry (f)) (s,t) = f (s,t).
What are the requirements or type compatibility between a formal and actual parameter of a procedure or function in Pascal or Modula-2. (I.e., are they required to be name equivalent, name equivalentA or assignment compatible?) Be sure to give answers for both value and var parameters (hint: the answers are different). Look for a Pascal or Modula-2 reference manual to help with the answer. The Think Pascal manual (hopefully found in the Mac lab) is one source for the answer.
Problem 6, page 185 of Louden. You need only answer for Pascal. Be sure to explain your answers. (See also problem 38 on page 190 and its answer in back of book to see how C handles this - but don't turn in solution to 38!)
Investigate how Clu or Algol 68 handle variant records in a type-safe way. Give a careful comparison with the way variant records are handled in Ada (as explained in class). Be sure to discuss how each guarantees type safety. Which method would you prefer if you had the choice? Give reasons for your choice. (Articles on Clu by Barbara Liskov can be found in Horowitz's Programming Languages: A Grand Tour, which is on the reserve shelf. You may have to look a bit harder for information on ALGOL 68.)
Do problem 36, page 190 of Louden expressing the answer as an ML program:
Create an ML datatype to represent trees corresponding to Pascal type declarations (define something like the type "term" - representing abstract syntax trees - used in your interpreter). Now write a type_equal function (in ML) which determines if two such types are structurally equivalent (i.e., don't use Pascal equivalence, use structural equivalence). Include integer, real, subranges of integers, arrays (with integer subranges as subscripts), records, and pointers. (Note: If you'd like more of a challenge, do this problem instead in Pascal, Modula-2, or C. )
Please discuss the added difficulties of handling recursive types as well (see problem 37). A complete solution for this extension earns you lots of bonus points! Here you need only tell me what would make the solution hard.
a. What value does your interpreter return in evaluating:
```
        let x = 1
            in let g = fn y => succ x
            in let x = 7
            in g 0 end end end
```
where (let x = M in N end) is shorthand for ((fn x => N) M)? (You can find the translated term in file ~kim/cs334stuff/ML/ML.interp/let.pcf.)
b. What is the correct answer (independent of your interpreter) for the case in which static scoping is desired.

c. What is the correct answer for dynamic scoping?
. This program builds on the last problem from the previous assignment. Now we will see how to program an interpreter in a much more efficient manner. No practical interpreter uses explicit substitution of code for efficiency reasons. Most interpreters instead use the idea of a closure to implement substitution.

Define the datatypes
```
    datatype value = NUM of int | BOOL of bool | SUCC | PRED | 
            ISZERO | CLOSURE of (string * term * env) | 
            THUNK of term * env |  ERROR         
    withtype env = string -> value;
```
where env represents the type of environments. The withtypeconstruct allows one to define a datatype with a mutually recursive type definition. Environments are one way of encoding substitution---the bindings of free variables in a PCF term are given by an environment. The type of values represents the final answers, or values, returned by PCF programs. The first five are self-explanatory. The sixth value, a closure, is the representation for functions: the first part of a closure is the formal parameter, the second part is the body of the function, and the third part is an environment that gives meaning to the free variables in the body of the function. A THUNK is a way of suspending evaluation of a term (similar to a closure), which involves saving the term and its current environment.

a. For practice, first define a default environment default that returns ERROR given any string s.

b. Define a function, update env s t, that, given an environment, env, a string, s representing a variable, and a value, t, returns a new environment that, when applied to s returns t, and otherwise returns what env would have returned. We write this update operation in our rules below as env[s:=t].

Discussion: In this interpreter defined in part c, applying a function to an argument will result in an updated environment (the formal parameter will be associated with the value of the actual parameter in the updated environment). The function body will then be evaluated in this new environment. The only question is, which environment should be updated to reflect the actual parameter? Since we are considering statically scoped languages here, the environment to be updated is the one in effect when the function was defined, rather than the one in place when the function is called! Since we don't have many names floating around here it may be difficult for you to appreciate this distinction. However, look at the following example:
```
        let f = fn x => (iszero (succ x))
            in let g = fn y => f y
            in let f = fn x => (iszero x)
            in g 0
```
where (let x = M in N) is shorthand for ((fn x => N) M). This should evaluate to false under static scoping (the f in the definition of g is the outermost f), while it returns true under dynamic scoping (the f in the definition of g refers to the most recently defined value of f - the innermost one). Make sure your interpreter evaluates this expression (after you have gotten rid of all of the let's) properly.

c. Define a function, newinterp t env, that takes a term, t, (possibly involving free variables) and an environment, env, that gives meaning to all the free variables in the term, and returns a value representing the evaluation of the term in that environment. (Note: In the initial call to evaluate a term, you will want to pass your default environment from above. Nevertheless, the function should still take two arguments, since in recursive calls---particularly with applying functions---you will want to call newinterp with different environments. See rule 11 below.) In the rules below, rho stands for an environment:
```
(1) (n, rho) >> n  for n an integer.

(2) (true, rho ) >> true, and similarly for false

(3a) (x, rho) >> rho(x), if rho(x) is not a thunk

(3b) (x, rho) >>  v, if rho(x) is of the form thunk(e,rho') and 
        (e, rho') >> v, otherwise

(4) (error, rho) >> error

(5) (succ, rho) >> succ, and similarly for the other initial functions

(6) (fn x => e, rho) >> closure(x, e, rho)
```
Notice that we save the defining environment along with the formal parameter and function body.
```
       (b, rho) >> true,       (e1, rho) >> v     
(7)    --------------------------------------      
          (if b then e1 else e2, rho) >> v       

       (b, rho) >> false,       (e2, rho) >> v
(7a)   --------------------------------------
          (if b then e1 else e2, rho) >> v

       (e1, rho) >> succ,       (e2, rho) >> n
(8)    -------------------------------------
          ((e1 e2), rho) >> (n+1)


        (e1, rho) >> pred,       (e2, rho) >> 0           
(9a)    -------------------------------------
           ((e1 e2), rho) >> 0                      

        (e1, rho) >> pred,      (e2, rho) >> (n+1)
(9b)    -------------------------------------
           ((e1 e2), rho) >> n

    
        (e1, rho) >> iszero, (e2, rho) >> 0
(10a)   -------------------------------------
           ((e1 e2), rho) >> true               

        (e1,rho) >> iszero, (e2, rho) >> (n+1)
(10a)   -------------------------------------
           ((e1 e2), rho) >> false


        (e1, rho) >> closure(x, e3, rho_f),  (e2, rho) >> v1, 
                    (e3, rho_f[x:=v1]) >> v
(11)    ----------------------------------------------------
                    ((e1 e2), rho) >> v
```
Notice that the body of the function is interpreted in the environment from the closure, updated to reflect the assignment of the actual parameter value to the formal.
```
        (e, rho[x:=thunk(rec x => e, rho)]) >> v
(12)    ----------------------------------------
                (rec x => e, rho) >> v
```
Notice that in evaluating a recursive expression we simply evaluate the body in an environment in which the recursive name stands for the recursive expression. It is important to put this in the environment so that other recursive calls can be evaluated properly. We use thunks to suspend the evaluation of the call so that it will be not be evaluated until it is needed. When the thunk is encountered, the term can be evaluated in the environment stored with it (as in rule 3).