CS 334 Lecture 12

Contents:

    1. Natural Semantics for commands
    2. Iterators
    3. Exceptions
      1. Exception mechanism in programming languages:
      2. Exception handling in Ada:
      3. Resuming after exceptions
  1. ABSTRACTION
    1. Accessing non-local information:
      1. Data Parameters
        1. Call by Reference (FORTRAN, Pascal):
        2. Call by Copy (Algol 60, Pascal, C, etc.):
        3. Call by Name (Algol-60)
    2. Two major problems which arise with subprograms:
      1. Side-effects:
      2. Aliasing:
    3. Correspondence Principle
    4. Problems with writing large programs:
      1. Characteristics of solution:
    5. Abstract Data Types
      1. Specification:
      2. Implementation (Representation):

Natural Semantics for commands

Can write natural semantics for various commands:

With commands must keep track of store: locations -> storable values.

If expressions can have side-effects then must update rules to keep track of effect on store. Rewriting rules now have conclusions of form (e, rho, s) >> (v, s') where v is a storable value, rho is an environment (mapping from identifiers to denotable values - including locations), s is initial state (or store), and s' is state after evaluation of e.

    (b, rho, s) >> (true, s')    (e1, rho, s') >> (v, s'')
    ------------------------------------------------------
          (if b then e1 else e2, rho, s) >> (v, s'')
Thus if evaluation of b and e1 have side-effects on memory, then show up in "answer".

Axioms - no hypotheses!

    (id, rho, s) >> (s(loc), s)        where  loc = rho(id)

(id++, rho, s) >> (v, s[loc:=v+1]) where loc = rho(id), v = s(loc)

Note s[loc:=v+1] is state, s', identical to s except s'(loc) = v+1.
    (e1, rho, s) >> (v1, s')    (e2, rho, s') >> (v2, s'')
    ------------------------------------------------------
            (e1 + e2, rho, s) >> (v1 + v2, s'')
When evaluate a command, "result" is a state only.

E.g.,

        (e, rho, s) >> (v, s')
    ------------------------------   where rho(x) = loc
    (x := e, rho, s) >> s'[loc:=v]

    (C1, rho, s) >> s'    (C2, rho, s') >> s''
    ------------------------------------------
             (C1; C2, rho, s) >> s''

    (b, rho, s) >> (true, s')   (C1, rho, s') >> s''
    ------------------------------------------------
          (if b then C1 else C2, rho, s) >> s''

+ similar rule if b false

     (b, rho, s) >> (false, s')
    ---------------------------
    (while b do C, rho, s) >> s'

    (b, rho, s) >> (true, s')    (C, rho, s') >> s''   
             (while b do C, rho, s'') >> s'''
    ------------------------------------------------
              (while b do C, rho, s) >> s'''

Notice how similar definition of semantics for

    while E do C
is to
    if E then begin 
        C; 
        while E do C 
    end

Iterators

Clu allows definition of user-defined iterators (abstract over control structures):
        for c : char in string_chars(s) do ...
where have defined:
        string_chars = iter (s : string) yields (char);
            index : Int := 1;
            limit : Int := string$size (s);
            while index <= limit do
                yield (string$fetch(s, index));
                index := index + 1;
            end;
        end string_chars;

Behave like restricted type of co-routine.

Each time at top of loop continue executing iterator code from where last left off.

When hit "yield" statement then return the associated value.

When hit end of iterator, quit loop.

Can be implemented on stack similarly to procedure call.

Now available in Sather and C++.

Exceptions

Need mechanism to handle exceptional conditions.

Example: Using a stack, and try to pop element off of empty stack.

Clearly corresponds to mistake of some sort, but stack module doesn't know how to respond.

In older languages main way to handle is to print error message and halt or include boolean flag in every procedure telling if succeeded. Then must remember to check!

Another option is to pass in a procedure parameter which handles exceptions.

Exception mechanism in programming languages:

Can raise an exception and send back to caller who is responsible for handling exception.

Call program robust if recovers from exceptional conditions, rather than just halting (or crashing).

Typical exceptions: Arithmetic or I/O faults (e.g., divide by 0, read int and get char, array or subrange bounds, etc.), failure of precondition, unpredictable conditions (read past end of file, end of printer page, etc.), tracing program flow during debugging.

When exception is raised, it must be handled or program will fail!

Exception handling in Ada:

Raise exception via: raise excp_name

Attach exception handlers to subprogram body, package body, or block.

Ex:

    begin
        C
    exception
        when excp_name1 => C'
        when excp_name2 => C''
        when others => C'
    end

When raise an exception, where do you look for handler? In most languages, start with current block (or subprogram). If not there, force return from unit and raise same exception to routine which called current one, etc., up the dynamic links until find handler or get to outer level and fail. (Clu starts at calling routine.)

Semantics of raising and handling exceptions is dynamic rather than static!

Handler can attempt to handle exception, but give up and call another exception.

Resuming after exceptions

What happens after have found exception handler and successfully executed it (i.e., no further exceptions raised)?

In Ada, return from the procedure (or unit) containing the handler - called termination model.

PL/I has resumption model - go back to re-execute statement where failure occurred (makes sense for read errors, for example) unless GOTO in handler code.

Eiffel (an OOL) uses variant of resumption model.

Exceptions in ML can pass parameter to exception handlers (like datatype defs). Otherwise very similar to Ada.

Example:

datatype 'a stack = EmptyStack | Push of 'a * ('a stack);
exception empty;
fun pop EmptyStack = raise empty
  | pop(Push(n,rest)) = rest;
fun top EmptyStack = raise empty
  | top (Push(n,rest)) = n;
fun IsEmpty EmptyStack = true
  | IsEmpty (Push(n,rest)) = false;
  
exception nomatch;
 
fun buildstack nil initstack = initstack
  | buildstack ("("::rest) initstack = buildstack rest (Push("(",initstack))
  | buildstack (")"::rest) (Push("(",bottom)) = bottom
  | buildstack (")"::rest) initstack = raise nomatch
  | buildstack (fst::rest) initstack = buildstack rest initstack;
        
fun balanced string = (buildstack (explode string) = EmptyStack) 
                        handle nomatch => false;

Notice awkwardness in syntax. Need to put parentheses around the expression to which the handler is associated!

Some would argue shouldn't use exception nomatch since really not unexpected situation. Just a way of introducing goto's in code!

ABSTRACTION

Distinction between what something does and how it does it.

Interested in supporting abstraction (separation between what and how).

Originally, designers attempted to create languages w/ all types and statements that were necessary.

Realized quickly that needed extensible languages.

First abstractions for statements and expressions - Procedures and Functions

Arrays and records, then pointers introduced to build new types and operations on them.

Built-in types have associated operations - representation is hidden (for most part)

Support of ADT's is most important innovation of 1970's.

Simula 67 - package op's w/ data types - representation not hidden

Clu, Mesa, Modula-2, Ada, Smalltalk

Come back to them in Chapter 8.

Iterators correspond to abstraction over control structure

- high-order fcns in ML even more so!

Book discusses selector abstractions: Calculate location rather than value.

More support for abstraction, generally more expressive is language.

Use of parameters supports abstraction -
Creates more flexible program phrases.

Accessing non-local information:

Common, Global variables (in block-structured languages),

Parameters - data, subprograms, types

Data Parameters

1. Call by Reference (FORTRAN, Pascal):

Pass address of actual parameter.

Access via indirection.

What if parameter is expression or constant? CHGTO4(2).

2. Call by Copy (Algol 60, Pascal, C, etc.):

Actual parameter copies value to formal parameter (and/or vice-versa).

value (in), result (out), value-result (in-out)

result and value-result parameters must be variables, value can be any storable value.

Can be expensive for large parameters.

3. Call by Name (Algol-60)

Actual parameter provides expression to formal parameter - re-evaluated whenever accessed.

Ex.

        Procedure  swap(a, b : integer);
            var temp : integer;
            begin
                temp := a;
                a := b;
                b := temp
            end;
Won't always work, e.g.

swap(i, a[i]) with i = 1, a[1] = 3, a[3] = 17.

No way to define a correct swap in Algol-60!

Expressive power - Jensen's device:

To compute: x = Sum for i=1 to n of Vi

    real procedure SUM (k, lower, upper, ak);
        value lower, upper;     
        integer k, lower, upper;
        real ak;
        begin
            real s;
            s := 0;
            for k := lower step 1 until upper do
                s := s + ak;
            sum := s
        end;

What is result of sum(i, 1, m, A[i])?

What about sum(i, 1, m, sum(j, 1, n, B[i,j]))?

If evaluating parameters has side-effects (e.g., read), then must know how and how many times parameter is evaluated to predict what will happen.

Therefore try to avoid call-by-name with expressions with side-effects.

Lazy evaluation is efficient implementation of call-by-name where only evaluate parameter once. Requires that there be no side-effects, since owise get diff. results.

Implement call-by-name using thunks - procedures which evaluate expressions - difficult and slow. Must pass around code for evaluating expression (including environment defined in). Can use the same THUNK's as show up in environment based interpreter.

Note different from call-by-text (which would allow capture of free vbles).

Can classify parameter passing by copying (value, result, or value-result) or definitional.

Definitional have constant, variable, procedural, and functional.

Constant parameters are treated as values, not variables - different from call-by-value.
Default for Ada in parameters.

Can think of call-by-name as definitional with expression parameter.

Note that difference in parameter passing depends on what is bound (value or address) and when it is bound.

Already seen how to pass functional (& procedural) parameters in our interpreter using closures.

Two major problems which arise with subprograms:

Side-effects:

Modifications of non-local environment

Often happens with global vbles

Also call by reference parameters, very dangerous in call-by-name.

Very disturbing in functions since can make it hard to figure out values of expressions. Example:

        A[f(j)] := j * f(j) + j 
Makes it harder to optimize - e.g. evaluate f(j) only once.

Aliasing:

More than one name for a variable

Most common ways of arising: global and parameter, two parameters, pointers

Example:

        Procedure swap( var x, y: integer);
        begin   
            x := x + y;
            y := x - y;
            x := x - y
        end;

Tricky way of completing swap of x and y w/out extra space.

Doesn't always work - swap (a,a) (but does work with value-result! )

Can get similar probs with A, A[i] as parameters and pointers

Another problem: Overlap btn global vble and by-reference parameter.

Causes problems with correctness since any two vbles may refer to the same object.

Also makes it difficult to optimize if can't predict when a vble might be changed.

If no aliasing, can't detect difference btn call-by-reference and call-by-value-result.

But not semantically equivalent if aliasing is possible.

Leads to problems in Ada where language definition does not specify whether in-out parameters are to be implemented by reference or value-result.

(Illegal program if it makes a difference - but not detectable!)

Unfortunately Ada doesn't enforce no aliasing.

Therefore possible problems with in out parameters.

Euclid (variant of Pascal) designed to write verifiable programs.

Attempted to eliminate aliasing.

Unfortunately some can only be caught at run-time, e.g. p(A[i], A[j])

legality assertions generated to check run-time problems.

Global vbles had to be explicitly imported to avoid problems

i.e. treated as implicit parameters

Correspondence Principle

Each parameter mechanism corresponds to declaration in language:

Correspondence Principle: For each form of declaration there exists a corresponding parameter mechanism, and vice-versa.

E.g., constant, variable (def. & declaration), procedure & function, type(?)

Problems with writing large programs:

Wulf and Shaw: Global Variables Considered Harmful (1973)
  1. Side effects - hidden access

  2. Indiscriminant access - can't prevent access - may be difficult to make changes later

  3. Screening - may lose access via new declaration of vble

  4. Aliasing - control shared access to prevent more than one name for reference variables.

Characteristics of solution:

  1. No implicit inheritance of variables

  2. Right to access by mutual consent

  3. Access to structure not imply access to substructure

  4. Provide different types of access (e.g. read-only)

  5. Decouple declaration, name access, and allocation of space. (e.g. scope indep of where declared, similarly w/allocation of space - like Pascal new)

Abstract Data Types

(Major thrust of programming language design in 70's)

Package data structure and its operations in same module - Encapsulation

Data type consists of set of objects plus set of operations on the objects of the type (constructors, inspectors, destructors).

Want mechanism to build new data types (extensible types).

Should be treated same way as built-in types.

Representation should be hidden from users (abstract).

Users only have access via operations provided by the ADT.

Distinguish between specification and implementation.

Specification:

Book states language should provide:

Method for defining data type and the operations on that type (all in same place). The definitions should not depend on any implementation details. The definitions of the operations should include a specification of their semantics.

Provides user-interface with ADT.

Typically includes

  1. Data structures: constants, types, & variables accessible to user (although details may be hidden)

  2. Declarations of functions and procedures accessible to user (bodies not provided here).
May also include axioms specifying behavior "promised" by any implementation. The following is an algebraic specification of behavior (see text for details).

Ex:

        pop(push(S,x)) = S, 

if not empty(S) then push(pop(S), top(S)) = S

Data + Operations (+ possibly equations) = Algebra

Implementation (Representation):

Again from text:

Method for collecting the implementation details of the type and its operations (in one place), and of restricting access to these details by programs that use the data type.

Usually not accessible to user.

Provides details on all data structures (including some hidden to users) and bodies of all operations.

Note that ADT methodology is orthogonal to top-down design

How to represent ADT's in programming languages?

Three predominant concerns in language design:

Reusable modules to represent ADT's quite important.