CS 334 Lecture 7

CS 334 Lecture 7

Contents:

    1. Semantics
    2. Specifying an interpreter with "natural semantics"
  1. TYPES
    1. Built-in types
    2. Aggregates
      1. Cartesian products
      2. Records (COBOL, Pascal, Ada) or Structures (PL/I, C, and ALGOL 68).
      3. Disjoint Union
      4. Mappings
        1. Arrays
        2. Function abstractions
      5. Powerset
      6. Recursive types
      7. Sequence
        1. Lists
        2. sequential files
        3. strings
    3. User-Defined Types
  2. STATIC VERSUS DYNAMIC TYPING

Semantics

Meaning of a program (once know it is syntactically correct). Work with virtual (or abstract) machine when discuss semantics of programming language constructs. Run program by loading it into memory and initializing ip to beginning of program

Official language definitions: Standardize syntax and semantics - promote portability.

Often better to standardize after experience. -- Ada standardized before a real implementation.

Common Lisp, Scheme, ML now standardized, Fortran '9x.

Good formal description of syntax, semantics still hard.

Backus, in Algol 60 Report promised formal semantics.

Specifying an interpreter with "natural semantics"

Semantics given in style of "natural semantics". Kind of operational semantics.

"e => v" means that when "e" is evaluated, it should return the value "v".

E.g. First few rules say nothing to do with simple values and function names:

  1. n => n for n an integer.

  2. true => true, and similarly for false

  3. error => error

  4. succ => succ, and similarly for the other initial functions.
Therefore if encounter simple value or function name, just return it - no further evaluation is possible. Think of these as base cases for the interpreter.

More interesting rules say that in order to evaluate a complex expression, first evaluate particular parts and then use those partial results to get the final value.

Look at following rule:

		b => true         e1 => v
	(5)	---------------------------
		if b then e1 else e2 => v
We read the rule from the bottom up: if the expression is an if-then-else with components b, e1, and e2, and b evaluates to true and e1 returns v, then the entire expression returns v. Of course, we also have the symmetric rule
		 b => false        e2 => v
	(6)	----------------------------
		 if b then e1 else e2 => v
Thus if we wish to evaluate an expression of the form "if b then e1 else e2" then first evaluate "b". If b evaluates to true, then, using rule (5), evaluate e1 to get some value, v. Return the value, v, as the final value of the "if" expression. If b evaluates to false, then use rule (6) and return the value of e2.

The application rules in homework 3 are similar. Essentially, evaluate the function. If it evaluates to one of the primitive functions, evaluate the argument and return the result of applying the primitive function to the value of the argument. Thus, the actual rule to be used is determined by the value of the function.

The following is an example which shows why you must evaluate the function:

	(if false then succ else pred) (pred 7)
The function evaluates to pred and the argument evaluates to 6. Using rule (8) from the homework, this should evaluate to 5.

TYPES

Support abstractions of set of elements and operations on them.

Built-in types:

  1. Hide representation
  2. Allow type-checking at compile and/or run-time
  3. Help disambiguate operators
  4. Allow expression of constraints on accuracy of representation.

Aggregates

Also come with built-in operations.

Cartesian products:

S x T = {<s,t> | s in S , t in T}.
Can also write as PRODi in I Si = S1 x S2 x ... x Sn. If all are the same, write Sn.

Tuples of ML: type point = int * int

How many elts in product?

What if have So? Called unit in ML.

Records (COBOL, Pascal, Ada) or Structures (PL/I, C, and ALGOL 68).

Heterogeneous collections of data.

Differ from Cartesian product since fields associated with labels

E.g.

	record				record
	   x : integer;	   /=  	           a : integer;
	   y : real			   b : real
	end;				end

Operations and relations: selection ".", :=, =.

Can use generalized product notation: PRODl in Lab T(l)

Ex. in first example above, Lab = {x,y}, T(x) = integer, T(y) = real.

Disjoint Union:

Variant record - type1 union type2 w/discriminant

Support alternatives w/in type:

Ex.

		RECORD
		   name : string;
		   CASE status : (student, faculty) OF
		      student: gpa : real;
		               class : INTEGER;
		   |  faculty: rank : (Assis, Assoc, Prof);
		   END;
		END;

Save space yet (hopefully) provide type security. Saves space because the amount of space reserved for a variable of this type is the larger of the variants.

Fails in Pascal / MODULA-2 since variants not protected.

How is this supported in ML?

datatype IntReal = INTEGER of int | REAL of real;
Can think of enumerated types as variant w/ only tags!

NOTICE: Type safe. Clu and Ada also support type-safe case for variants:

Ada: Variants - declared as parameterized records:

type geometric (Kind: (Triangle, Square) := Square) is
	record
	   color : ColorType := Red ;
	   case Kind of
	      when Triangle =>
	             pt1,pt2,pt3:Point;
	      when Square =>
	             upperleft : Point;
	             length : INTEGER range 1..100;
	   end case;
	end record;

ob1 : geometric -- default is Square
ob2 : geometric(Triangle) -- frozen, can't be changed
Avoids Pascal's problems w/holes in typing.

Illegal to change "discriminant" alone.

ob1 := ob2   -- OK
ob2 := ob1   -- generate run-time check to ensure Triangle
If want to change discriminant, must assign values to all components of record:
ob1 := (Color=>Red,Kind=>Triangle,pt1=>a,pt2=>b,pt3=>c);

If write code

	... ob1.length...
then converted to run-time check:
	if ob1.Kind = Square then ... ob1.length ....
	                     else raise constraint_error
	end if.

Fixes type insecurity of Pascal

Note disjoint union is not same as set-theoretic union, since have tags.

	IntReal = {INTEGER} x int + {REAL} x real

C supports undiscriminated unions:

	typedef union {int i; float r;} utype.
As usual with C, it is presumed that the programmer knows what he/she is doing and no static or run-time checking is performed.

Mappings:

Encompasses functions w/ both infinite and finite domains.

Arrays:

homogeneous collection of data.

Mapping from index type to range type
E.g. Array [1..10] of Real corresponds to {1,...,10} -> Real

Operations and relations: selection ". [.]", :=, =, and occasionally slices.

E.g. A[2..6] represents an array composed of A[2] to A[6]

Index range and location where array stored can be bound at compile time, unit activation, or any time.

In both static and semi-static languages the index set of an array is bound at compile time. The difference is that with static arrays, the location of the array in memory is bound at compile time (as in FORTRAN), while with semi-static, the size of the array is bound at compile time, but its location is determined at run-time.

For instance, in Pascal, an array stored in a local variable is allocated on the run-time stack, and its location may vary in different invocations of the procedure.

With semi-dynamic (or dynamic) arrays, the index set (and hence size) of the array may vary at run-time. For instance in ALGOL 60 or Ada, an array held in a local variables may have index bounds determined by a parameter to the routine. It is called semi-dynamic because the size is fixed once the routine has been activated.

A flexible array is one whose size can change at any time during the execution of a program. Thus, while a particular size array may be allocated when a procedure is invoked, the array may be expanded in the middle of a loop if more space is needed.

The key to these differences is binding time, as usual!

Function abstractions:

S->T ... function f(s:S):T (where S could be n-tuple) Operations: abstraction and application, sometimes composition.

What is difference from an array? Efficiency, esp. w/update.

	update f arg result x = if x = arg then result else f x
or
	update f arg result = fn x => if x = arg then result else f x
Procedure can be treated as having type S -> unit for uniformity.

Powerset:

	set of elt_type;
Typically implemented as bitset or linked list of elts

Operations and relations: All typical set ops, :=, =, subset, .. in ..

Why need base set to be primitive type? What if base set records?

Recursive types:

Examples:
  	tree = Empty | Mktree of int * tree * tree

list = Nil | Cons of int * list

In most lang's built by programmer from pointer types.

Sometimes supported by language (e.g. Miranda, Haskell, ML).

Why can't we have direct recursive types in ordinary imperative languages?

OK if use ref's:

	list = POINTER TO RECORD
			first:integer;
			rest: list
		END;

Recursive types may have many sol'ns

E.g. list = {Nil} union (int x list) has following sol'ns:

  1. finite sequences of integers followed by Nil: e.g., (2,(5,Nil))

  2. finite or infinite sequences, where if finite then end with Nil
Similarly with trees, etc.

Theoretical result: Recursive equations always have a least solution - though infinite set if real recursion.

Can get via finite approximation. I.e.,

   list0 = {Nil}

list1 = {Nil} union (int x list0) = {Nil} union {(n, Nil) | n in int}

list2 = {Nil} union (int x list1) = {Nil} union {(n, Nil) | n in int} union {(m,(n, Nil)) | m, n in int}

...

list = Unionn listn

Very much like unwinding definition of recursive function
	fact = fun n => if n = 0 then 1 else n * fact (n-1)
	
	fact0 = fun n => if n = 0 then 1 else undef
	
	fact1 = fun n => if n = 0 then 1 else n * fact0(n-1)
	      = fun n => if n = 0, 1 then 1 else undef
	      
	fact2 = fun n => if n = 0 then 1 else n * fact1(n-1)
	      = fun n => if n = 0, 1 then 1 else 
	                 if n = 2 then 2 else undef
	...


	fact = Unionn factn

Notice solution to T = A + (T->T) is inconsistent with classical mathematics!
In spite of that, however, it can be used in Computer Science,
	datatype univ = Base of int | Func of (univ -> univ);

Sequence:

Lists

Supported in most fcnal languages

operations: hd, tail, cons, length, etc.

sequential files

File operations: Erase, reset, read, write, check for end.

Persistent data - files.

strings:

ops: <, length, substr

Are strings primitive or composite?

User-Defined Types

User gets to name new types. Why?
  1. more readable
  2. Easy to modify if localized
  3. Factorization - why copy same complex def. over and over (possibly making mistakes)
  4. Added consistency checking in many cases.

STATIC VERSUS DYNAMIC TYPING

Most languages use static binding of types to variables, usually in declaration
	var x : integer  {bound at translation time}

FORTRAN has implicit declaration using naming conventions

If start with "I" to "N", then integer, otherwise real.

Other languages will "infer" type of undeclared variables.

In either case, run real danger of problems due to typos.

Example in ML, if

	datatype Stack ::= Nil | Push of int;
then define
	fun f Push 7 = ...
What error occurs?

Answer: Push is taken as a parameter name, not a constructor.

Therefore f is given type: A -> int -> B rather than the expected: Stack -> B

Dynamic binding found in APL and LISP.

Type of variable may change during execution.

E.g., may have x := 0 at one point and x := [5,2,3] at some other point, yet x is only declared once.

Dynamic binding harder to implement since can't allocate a fixed amount of space for variables. Therefore often implemented as pointer to memory holding value.

Another problem is not knowing which version of overloaded operations to use (e.g., "+") until ready to execute the statement.

Must carry around type tag with every variable.