CS 334 Lecture 11

Contents:

  1. DYNAMIC MEMORY MANAGEMENT
    1. Memory management in the heap
    2. Reclamation of free storage
      1. Reference Counting
      2. Garbage Collection
  2. COMMANDS OR STATEMENTS:
    1. Assignment:
    2. Sequencing: S; T
    3. Selection: If .. then ... else ...
    4. Repetition: while ... do ...
    5. Natural Semantics for commands
    6. Iterators
    7. Exceptions
      1. Exception mechanism in programming languages:
      2. Exception handling in Ada:
      3. Resuming after exceptions
Exam: Chapters 1-7, 10 and extra material covered in class and homework.

Distributed in Tuesday class after break. Due on Friday promptly at beginning of class.

DYNAMIC MEMORY MANAGEMENT

(See section 10.8 in text)

We cannot use a stack-based discipline for function calls in a functional languagebecause of difficulties in returning functions as values from other functions.

As a result, activation records must be allocated from a heap. Similar difficulties in passing around closures result in most object-oriented languages relying on heap allocated memory for objects. Because it is often not clear when memory can be safely freed, such languages usually rely on an automatic mechanism for recycling memory.

In this lecture we discuss methods for automatically managing and reclaiming free space. We being with the simpler task of managing free space.

Memory management in the heap

A heap is usually maintained as a list or stack of blocks of memory. Initially all of the free space is maintained as one large block, but requests (whether explicit or implicit) for storage and the subsequent recycling of blocks of memory will eventually result in the heap being broken down into smaller pieces.

When a request is made (e.g., via a "new" statement) for a block of memory, some strategy will be undertaken to allocate a block of memory of the desired size. For instance one might search on the list of free space for the first block which is at least as large as the block desired or one might look for a "best fit" in the sense of finding a block which is as small as possible, yet large enough to satisfy the need.

Whichever technique is chosen, only enough memory as is needed will be allocated, with the remainder of the block returned to the stack of available space.

Unless action is taken, the heap will eventually be composed of smaller and smaller blocks of memory. In order to prevent this, the operating system will normally attempt to merge or coalesce adjacent blocks of free memory. Thus whenever a block of memory is ready to be returned to the heap, the adjacent memory locations are examined to determine whether they are already on the heap of available space. If either (or both) are, then they are merged with the new block and put in the heap.

Even with coalescing, the heap can still become fragmented, with lots of small blocks of memory being used alternating with small blocks in the heap of available space.

This can be fixed by occasionally compacting memory by moving all blocks in use to one end of memory and then coalescing all the remaining space into one large block. This can be very complex since pointers in all data structures in use must be updated. (The Macintosh requires the use of handles in order to accomplish this!)

Reclamation of free storage

Aside from the manual reclamation of storage using operations like "dispose", there are two principle automatic mechanisms for reclaiming storage: reference counting and garbage collection. The first is an eager mechanism while the second is lazy.

Reference Counting

Reference counting is conceptually simpler than garbage collection, but often turns out to be less efficient overall. The idea behind reference counting is that each block of memory is required to reserve space to count the number of separate pointers to it.

Thus an assignment of pointers of the form p := q is executed then the block of memory that p originally points to has its reference count decreased by one, while that pointed to by q would have its count increased by one.

If the count on a block of memory is reduced to zero, it should be returned to the heap of available space. However, if it has pointers to other blocks of memory, those blocks should also have their reference counts reduced accordingly.

One drawback of this system is that each block of memory allocated must have sufficient space available to maintain its reference count. However a more serious problem is the existence of circular lists. Even if nothing else points to the circular list, each item of the list will have another item of the list pointing to it. Thus even if a circular list is inaccessible from the program, the reference counts of all of its will still be positive and it will not be reclaimed.

Garbage Collection

Garbage collection is a more common way of handling automatic storage reclamation. The basic idea is that computation continues until there is no storage left to allocate. Then the garbage collector marks all of the blocks of memory that are currently in use and gathers the rest (the garbage) into the heap of available space.

The mark and sweep algorithm for garbage collection starts with all objects accessible from the current environment (or symbol table), marks them and then does the same with all objects accessible from those, etc. After this phase the algorithm sweeps through all of memory, collecting those blocks which were not marked in the first phase (and unmarking the rest). Normal processing then resumes.

There are two problems with this technique. The first is the space necessary in order to hold the mark (though this can just be one bit). A more serious problem is that this algorithm requires two passes through memory: The first to mark and the second to collect. This can often take a significant amount of time (notice the delays in emacs, for example), making this sort of garbage collection unsuitable for real-time systems. This disadvantage has led to this method being abandoned by most practical systems (though still described in texts).

There have been several recent improvements in garbage collection algorithms. The first is sometimes known as a copying collector.

In this algorithm the memory is first divided into two halves, the working half and the free half. When memory is exhausted in the working half, live nodes are copied to free half of memory, and the roles of the two halves can be switched. Notice that the collector only looks at live cells, rather than all. Can be done incrementally, so that very little cost is paid at any one time (less than 50 instructions, probably). This tends to work well with a virtual memory system.

Another strategy is to use a generational collector in which only bother to garbage collect recently allocated blocks of memory. Older blocks are moved into stable storage and not collected as often. Studies have shown that most reclaimed garbage comes from more recently allocated blocks of memory.

In highly parallel architectures can have garbage collection take place in background, minimizing or eliminating delays.

COMMANDS OR STATEMENTS:

Change "state" of machine.

State of computer corresponds to contents of memory and any external devices (I/O)

State sometimes called "store"

Note distinction between "state" and "environment". Environment is mapping between identifiers and values (including locations). State includes mapping between locations and values.

Values in store or memory are "storable" versus "denotable" (or "bindable")

Symbol table depends on declarations and scope - static

Environment tells where to find values - dynamic

State depends on previous computation - dynamic

If have compiler, use symbol table when generating code to determine meaning of all identifiers. At run-time, symbol table no longer needed (hard coded into compiled code), but state and environment change dynamically.

In interpreter, may have to keep track of symbol table, environment, and state at run-time. (In fact could avoid using state if there is no "aliasing" in the language.)

Assignment:

vble := expressions

Order of evaluation can be important, especially if there are side-effects. Usually left-side evaluated first, then right-side.

		A[f(j)] := j * f(j) + j -- 
difficult to predict value if f has side effect of changing j

Two kinds of assignments:

1. assignment by copying and

2. assignment by sharing (often handy w/dynamic typing or OOL's)

Most statements are actually control structures for combining other expressions and statements:

Sequencing: S; T
Selection: If .. then ... else ...
Repetition: while ... do ...

FORTRAN started with very primitive control structures:

Very close to machine instructions

Why need repetition - can do it all with goto's?

"The static structure of a program should correspond in a simple way with the dynamic structure of the corresponding computation." Dijkstra letter to editor.

ALGOL 60 more elaborate:

Pascal expanded but simplified: Ada like Pascal but more uniform loop with exit
		iteration specification loop 
			loop body
		end loop.
where iteration specification can be: Also can have vanilla loop which can be left w/ exit statement.

Also provide exit when ...., syntactic sugar for if .. then exit

Can also exit from several depths of loops

Interesting theoretical result of Bohm and Jacopini (1966) that every flowchart can be programmed entirely in terms of sequential, if, and while commands.

Natural Semantics for commands

Can write natural semantics for various commands:

With commands must keep track of store: locations -> storable values.

If expressions can have side-effects then must update rules to keep track of effect on store. Rewriting rules now have conclusions of form (e, rho, s) >> (v, s') where v is a storable value, rho is an environment (mapping from identifiers to denotable values - including locations), s is initial state (or store), and s' is state after evaluation of e.

    (b, rho, s) >> (true, s')    (e1, rho, s') >> (v, s'')
    ------------------------------------------------------
          (if b then e1 else e2, rho, s) >> (v, s'')
Thus if evaluation of b and e1 have side-effects on memory, then show up in "answer".

Axioms - no hypotheses!

    (id, rho, s) >> (s(loc), s)        where  loc = rho(id)

(id++, rho, s) >> (v, s[loc:=v+1]) where loc = rho(id), v = s(loc)

Note s[loc:=v+1] is state, s', identical to s except s'(loc) = v+1.
    (e1, rho, s) >> (v1, s')    (e2, rho, s') >> (v2, s'')
    ------------------------------------------------------
            (e1 + e2, rho, s) >> (v1 + v2, s'')
When evaluate a command, "result" is a state only.

E.g.,

        (e, rho, s) >> (v, s')
    ------------------------------   where rho(x) = loc
    (x := e, rho, s) >> s'[loc:=v]

    (C1, rho, s) >> s'    (C2, rho, s') >> s''
    ------------------------------------------
             (C1; C2, rho, s) >> s''

    (b, rho, s) >> (true, s')   (C1, rho, s') >> s''
    ------------------------------------------------
          (if b then C1 else C2, rho, s) >> s''

+ similar rule if b false

     (b, rho, s) >> (false, s')
    ---------------------------
    (while b do C, rho, s) >> s'

    (b, rho, s) >> (true, s')    (C, rho, s') >> s''   
             (while b do C, rho, s'') >> s'''
    ------------------------------------------------
              (while b do C, rho, s) >> s'''

Notice how similar definition of semantics for

    while E do C
is to
    if E then begin 
        C; 
        while E do C 
    end

Iterators

CLU allows definition of user-defined iterators (abstract over control structures):
    for c : char in string_chars(s) do ...
where have defined:
    string_chars = iter (s : string) yields (char);
       index : Int := 1;
       limit : Int := string$size (s);
       while index <= limit do
          yield (string$fetch(s, index));
          index := index + 1;
       end;
    end string_chars;

Behave like restricted type of co-routine.

Can be implemented on stack similarly to procedure call.

Now available in Sather, C++, and Java.

Exceptions

Need mechanism to handle exceptional conditions.

Example: Using a stack, and try to pop element off of empty stack.

Clearly corresponds to mistake of some sort, but stack module doesn't know how to respond.

In older languages main way to handle is to print error message and halt or include boolean flag in every procedure telling if succeeded. Then must remember to check!

Another option is to pass in a procedure parameter which handles exceptions.

Exception mechanism in programming languages:

Can raise an exception and send back to caller who is responsible for handling exception.

Call program robust if recovers from exceptional conditions, rather than just halting (or crashing).

Typical exceptions: Arithmetic or I/O faults (e.g., divide by 0, read int and get char, array or subrange bounds, etc.), failure of precondition, unpredictable conditions (read past end of file, end of printer page, etc.), tracing program flow during debugging.

When exception is raised, it must be handled or program will fail!

Exception handling in Ada:

Raise exception via: raise excp_name

Attach exception handlers to subprogram body, package body, or block.

Ex:

	begin
		C
	exception
		when excp_name1 => C'
		when excp_name2 => C''
		when others => C'
	end

When raise an exception, where do you look for handler? In most languages, start with current block (or subprogram). If not there, force return from unit and raise same exception to routine which called current one, etc., up the dynamic links until find handler or get to outer level and fail. (Clu starts at calling routine.)

Semantics of raising and handling exceptions is dynamic rather than static!

Handler can attempt to handle exception, but give up and call another exception.

Resuming after exceptions

What happens after have found exception handler and successfully executed it (i.e., no further exceptions raised)?

In Ada, return from the procedure (or unit) containing the handler - called termination model.

PL/I has resumption model - go back to re-execute statement where failure occurred (makes sense for read errors, for example) unless GOTO in handler code.

Eiffel (an OOL) uses variant of resumption model.

Exceptions in ML can pass parameter to exception handlers (like datatype defs). Otherwise very similar to Ada.

Example:

datatype 'a stack = EmptyStack | Push of 'a * ('a stack);
exception empty;
fun pop EmptyStack = raise empty
  | pop(Push(n,rest)) = rest;
fun top EmptyStack = raise empty
  | top (Push(n,rest)) = n;
fun IsEmpty EmptyStack = true
  | IsEmpty (Push(n,rest)) = false;
  
exception nomatch;
 
fun buildstack nil initstack = initstack
  | buildstack ("("::rest) initstack = buildstack rest (Push("(",initstack))
  | buildstack (")"::rest) (Push("(",bottom)) = bottom
  | buildstack (")"::rest) initstack = raise nomatch
  | buildstack (fst::rest) initstack = buildstack rest initstack;
        
fun balanced string = (buildstack (explode string) = EmptyStack) 
                        handle nomatch => false;
Notice awkwardness in syntax. Need to put parentheses around the expression to which the handler is associated!

Some would argue shouldn't use exception nomatch since really not unexpected situation. Just a way of introducing goto's in code!