Williams College CS334 - Programming Languages

CS 334
Programming Languages
Spring 2002

Lecture 9

Adding a run-time environment to interpreter

We have earlier described substitution as a reasonable mechanism for interpreting function application (called beta-conversion), but there are a few places where you must be very careful with name clashes if we have free variables. (See section 10.7 in the text for details. PCF is actually a slightly enriched version of the lambda calculus where we write fn x => e rather than λx. e)

We normally expect that if we change the names of formal parameters that it should not make any difference, but ...

Suppose we evaluate:

   let  fun g x y = x + y  in g y end;

(or in our language PCF:

   (fn g => g y) (fn x => fn y => x+y))

If we evaluate blindly we get:

   fn y => y + y

Notice that because of scoping, the actual parameter y has become captured by the formal parameter y!

We should get: fn w => y + w, which has a very different meaning!!

(Note that we did not run into this problem earlier since during our evaluations we never worked with terms with free variables - when going inside functions we replaced all formal parameters by the actual parameters, which didn't involve free variables).

A different order of evaluation would have brought forth the same problem, however.

We would like to have fn x => B to represent the same function as fn y => B[x:=y] as long as y doesn't occur freely in B. (Called alpha-conversion)

If you always alpha-convert to a new bound variable before substituting in, will never have problems, but this is a pain in the neck.

Instead we will valuate terms with respect to environments. Intuitively, an environment is a mapping from strings (representing identifiers) to values. That is, it tells us for every identifier currently in scope, what the value of that identifier is.

Rather than representing the environment as a function, we will represent it as an "association list", a set of pairs of strings and values:

   env = (string * value) list

We can look up the value of an identifier in an environment by simply searching for the first occurrence of the identifier in the list.

We write [[e]] ev for the meaning of e with respect to environment ev.

E.g. if ev(x) = 12 and ev(y) = 2, then [[x+y]] ev = 14.

How does function application result in change of environment?

   [[(fn x => body) actual]]ev = [[body]] (ev [x := [[actual]]ev])

where ev[x := v] is environment like ev except x has value "v".

This and rec are the only rules in which the environment changes!

Rest of rules look like the old interpreter (except identifiers are now looked up in the environment)!

Replaces all uses of subst!

This means that computation no longer takes place by rewriting terms into new terms, interp is now a function from term to value.

Note that

	let val x = arg in e

is equivalent to

	(fn x => e) arg

Must worry about scoping problems:

   val test = let 
                  val x = 3;
                  fun f y = x + y;
                  val x = 12
              in 
                  x + (f 7)
              end;

What is value of test? In particular, what is the value of (f 7)?

Change in scope is reflected by change in environment.

With functions must remember environment function was defined in!

When apply function, apply in defining environment.

test is equivalent to

   (fn x => (fn f => ((fn x => x + (f 7)) 12) (fn y => x + y))) 3

Then

 
   [[(fn x => (fn f => ((fn x => x + (f 7)) 12) (fn y => x + y))) 3]] ev0
	= [[(fn f => ((fn x => x + (f 7)) 12) (fn y => x + y)) ]] ev1
	= [[(fn x => x + (f 7)) 12]] ev2
	= [[x + (f 7)]] ev3
	= 12 + ([[fn y => 3 + y]] ev1) 7
	= 12 + [[3 + y]] ev4
	= 12 + 3 + 7 
	= 22

where ev0 is the starting environment and

   ev1 = ev0 [x := [[3]] ev0] = ev0[x := 3]
   ev2 = ev1 [f := [[fn y => x + y]] ev1] 	<-	Closure for f
   ev3 = ev2 [x := [[12]] ev2] = ev2[x := 12]
   ev4 = ev1 [y := 7]

Notice that ev2 is created by adding a closure to the environment to represent the meaning of f. In the formal language, we would write that particular closure more like CLOSURE ("y", x + y, ev1). (The only thing we can't do directly in the language is define x + y.

Back to types!

Mappings:

Encompasses functions w/ both infinite and finite domains.

Arrays:

homogeneous collection of data.

Mapping from index type to range type
E.g. Array [1..10] of Real corresponds to {1,...,10} -> Real

Operations and relations: selection "^.[^.]", =, ==, and occasionally slices.

E.g. A[2..6] represents an array composed of A[2] to A[6]

Index range and location where array stored can be bound at compile time, unit activation, or any time.

size and location bound at compile-time (static): FORTRAN
size bound at compile time, location bound of procedure invocation (semi-static): Pascal, C.
size and location bound at creation (dynamic): ALGOL 60, Ada, Java
size and location can be changed at any time (flex): Java vectors.

The key to these differences is binding time, as usual!

Function abstractions:

S->T ... function f(s:S):T (where S could be n-tuple)

What if S were a record instead of an n-tuple?

Operations: abstraction and application, sometimes composition.

What is difference from an array? Efficiency, esp. w/update.

	update f arg result x = if x = arg then result else f x

	update f arg result = fn x => if x = arg then result else f x

Procedure can be treated as having type S -> unit for uniformity.

Recursive types:

Examples:

  	tree = Empty | Mktree of int * tree * tree
	list = Nil | Cons of int * list

In most lang's built by programmer from pointer types.

Sometimes supported by language (e.g. Miranda, Haskell, ML).

Why can't we have direct recursive types in ordinary imperative languages?

OK if use ref's:

	list = POINTER TO RECORD
			first:integer;
			rest: list
		END;

Recursive types may have many sol'ns

E.g. list = {Nil} union (int x list) has following sol'ns:

finite sequences of integers followed by Nil: e.g., (2,(5,Nil))
finite or infinite sequences, where if finite then end with Nil

Similarly with trees, etc.

Theoretical result: Recursive equations always have a least solution - though infinite set if real recursion.

Can get via finite approximation. I.e.,

   list₀ = {Nil}
   list₁ = {Nil} union (int x list₀) 
         = {Nil} union {(n, Nil) | n in int}

   list₂ = {Nil} union (int x list₁) 
         = {Nil} union {(n, Nil) | n in  int}
                 union {(m,(n, Nil)) | m, n in int}

      ...

   list = Union_n list_n

Very much like unwinding definition of recursive function

	fact = fun n => if n = 0 then 1 else n * fact (n-1)
	
	fact₀ = fun n => if n = 0 then 1 else undef
	
	fact₁ = fun n => if n = 0 then 1 else n * fact₀(n-1)
	      = fun n => if n = 0, 1 then 1 else undef
	      
	fact₂ = fun n => if n = 0 then 1 else n * fact₁(n-1)
	      = fun n => if n = 0, 1 then 1 else 
	                 if n = 2 then 2 else undef
	...


	fact = Union_n fact_n

Notice solution to T = A + (T->T) is inconsistent with classical mathematics!
In spite of that, however, it can be used in Computer Science,

	datatype univ = Base of int | Func of (univ -> univ);

Sequence:

Lists

Supported in most fcnal languages

operations: hd, tail, cons, length, etc.

sequential files

File operations: Erase, reset, read, write, check for end.

Persistent data - files.

strings:

ops: <, length, substr

Are strings primitive or composite?

Composite (arrays) in C, Pascal, Modula-2, ...
Primitive in ML
Lists in Miranda and Prolog: provides more flexibility (no length bound)

User-Defined Types

User gets to name new types. Why?

more readable
Easy to modify if localized
Factorization - why copy same complex def. over and over (possibly making mistakes)
Added consistency checking in many cases.

STATIC VERSUS DYNAMIC TYPING

Static: Most languages use static binding of types to variables, usually in declarations

	var x : integer  {bound at translation time}

The variable can only hold values of that type. (Pascal/Modula-2/C, etc.)

FORTRAN has implicit declaration using naming conventions

If start with "I" to "N", then integer, otherwise real.

Other languages will "infer" type of undeclared variables.

In either case, run real danger of problems due to typos.

Example in ML, if

	datatype Stack ::= Nil | Push of int;

then define

	fun f Push 7 = ...

What error occurs?

Answer: Push is taken as a parameter name, not a constructor.
Therefore f is given type: A -> int -> B rather than the expected: Stack -> B

Dynamic: Variables typically do not have a declared type. Type of value may vary during run-time. Esp. useful w/ heterogeneous lists, etc. (LISP/SCHEME).

Dynamic more flexible, but more overhead since must check type before performing operations (therefore must store tag w/ value).

Dynamic typing found in APL and LISP.

Type of variable may change during execution.
E.g., may have x := 0 at one point and x := [5,2,3] at some other point, yet x is only declared once.

Dynamic typing harder to implement since can't allocate a fixed amount of space for variables. Therefore often implemented as pointer to memory holding value.

Type compatibility

Problems arose in the language definition of Pascal having to do with type equivalence and compatibility.

Assignment compatibility:

When is x := y legal? x : integer, y : 1..10? reverse?
type hex = 0..15; ounces = 0..15;
var x : hex; y : ounces;
Is x := y legal?

Original report said both sides must have identical types.

When are types identical?

Ex.:

    Type    T = Array [1..10] of Integer;
    Var  A, B : Array [1..10] of Integer;
             C : Array [1..10] of Integer;
             D : T;
             E : T;

Which variables have the same type?

  T1 = record a : integer; b : real  end; 
  T2 = record c : integer; d : real  end;
  T3 = record b : real; a : integer  end;

Which are the same?

Worse:

  T = record info : integer; next : ^T  end; 
  U = record info : integer; next : ^V  end; 
  V = record info : integer; next : ^U  end;

Ada uses Name EquivalenceA
Pascal & Modula-2 use Name Equivalence for most part. C similar, but a bit different from previous two. Java uses structural equivalence only for array types - name for everything else.
Modula-3 uses Structural Equivalence

Two types are assignment compatible iff they

have equivalent types or
one is a subrange of the other, or
both are subranges of same base type.

Things are more complicated in object-oriented languages. Then assignment is OK if type of source is a subtype of the receiver type.

Back to:

CS 334 home page

Kim Bruce's home page

CS Department home page

kim@cs.williams.edu

CS 334 Programming Languages Spring 2002 Lecture 9

Back to types!

CS 334
Programming Languages
Spring 2002

Lecture 9