CS 334 Lecture 10

Type completeness principle:

No operation should be arbitrarily restricted in the types of the values involved.

Avoid second-class types.

Ex. in Pascal: Restrictions on return values of functions, lack of procedure variables, etc.

ML comes much closer to satisfying.

Summary of types so far:

postpone ADT's until later

Modern tendency to strengthen static typing and avoid implicit holes in types system.

- usually explicit (dangerous ) means for bypassing types system, if desired

Try to push as many errors to compile time as possible by:

Requiring overspecification through typing
Distinguishing btn diff. uses of same types (name equiv.)
Mandating constructs designed to eliminate typing holes
Minimizing or eliminating use of explicit pointers (esp. user-controlled deallocation of ptrs).

Problem: loss of flexibility which obtainable from dynamic typing or lack of any typing.

Important direction of current research in computer science:

Provide type safety, but increase flexibility.

Important progress over last 20 years:

Polymorphism, ADT's, Subtyping & other aspects of object-oriented languages.

STORAGE

What are storable values of language? Those that cannot be selectively updated.

Varies between languages.

Pascal: primitive (integer, real, char, boolean), sets, pointers

ML: primitive, records, tuples, lists, function abstractions, ref's to vbles.

Examine how variables allocated and lifetime.

Program Units:

Separate segments of code - usually allow separate declaration of local variables.

E.g. Procedures, functions, and blocks (from ALGOL 60 & C, like parameterless procedures located in-line.)

Program unit represented during execution by unit instance, composed of code segment and activation record (gives info on parameters and local variables, and where to return after execution).

Activation Record Structure:

Return address
Access info on parameters
Space for local variables

Units often need access to non-local variables.

How is procedure call made?

To call:

Make parameters available to callee.
Save state of caller (register, prog. counter).
Make sure callee knows how to find where to return to.
Enter callee at 1st instruction.

To return:

Get return address and transfer execution to that point.
Caller restores state.
If fcn, make sure result value left in accessible location (register, on top of stack, etc.)

Memory allocation

Three types of languages:

Static: E.g. FORTRAN and COBOL.
Stack-Based: E.g. ALGOL-like languages (including Pascal and C).
Dynamic: LISP, PROLOG, APL, ML, Miranda, Eiffel, etc. as well as aspects of Pascal, Ada, etc.

Static using FORTRAN as example

Units: Main program, Subroutines, and Functions.

All storage (local and global) known at translation time (hence static).

Activation records can be associated with each code segment.

Structure:

Return address
Access info on parameters
Space for local variables

At compile time, both instructions and vbles can be accessed by

(unit name, offset)

At link time can resolve to absolute addresses.

Global info shared via common statement:

	COMMON/NAME1/A,B,S(25)

Statement must occur in all units wishing to share information. Name of the block must be identical, though can give different names to variables. (Gives rise to holes in typing) Identifiers are matched in order w/ no checking of types across unit boundaries.

Space for all common blocks allocated and available globally.

Procedure call and return straightforward

Stack-based languages using ALGOL 60/Pascal

Problem during procedure activation

Static (e.g. scope) vs.
Dynamic (e.g. return address) Environments

Stack reflects dynamic environment.

Activation record pushed onto stack each time there is a procedure call.

Popped off after return.

Ex.

Program main;
    type array_type = array [1..10] of real;
    var a : integer;
         b : array_type;
    Procedure x (var c : integer; d : array_type);
        var e : array_type;
        procedure y (f : array_type);
            var g : integer;
            begin
                    :
                z(a+c);
                    :
            end; {y}
        begin {x}
             : ..... := b[6].......
            y(e);
             :
        end; {x}
    Procedure z (h : integer);
        var a : array_type;
        begin 
                :
            x (h,a);
                :
        end;
    begin {main}
            :
        x (a,b);
            :
    end. {main}

Draw static scopes.

Dynamic calling sequences:

	Main => x =>  y => z => x => y ....

Look at picture of run-time stack after x calls y 2nd time.

How do we get reference to b in x? to a in y? Where can these variables be held?

Dynamic link (called control link in text) provides pointer to beginning (usually) of caller's activation record.

Static link provides pointer to beginning of activation record of statically containing program unit.

How do find location of variable? Easy in FORTRAN! Not here!

Must keep track of the static nesting level of each variable and procedure.

When access vble or procedure, subtract static nesting level of definition from static nesting level of the definition.

Tells how far down static chain to environment of definition.

Example:

Name     Level          Name     Level

main            0           y           2
    a           1               f       3
    b           1               g       3
    x           1       z               1
        c       2           h           2
        d       2           a           2
        e       2

Notes:

Length of static chain from any fixed procedure to main program is always same length (doesn't depend on which activation we are in).
Any non-local vble will be found after some fixed number of static links (doesn't depend on which activation we are in).
This # of links is a constant determinable at compile time! (Difference between nesting level of call and callee.)

Thus represent identifier references in program as pair:

<chain position, offset>

Eg: from within y represent d as <1,nx+2> where nx is size of activation record of x before parameters. Similarly a is represented as <2, nmain+1>.

Allocation of Activation Record

Activation record size known statically.

All local vbles and parameters have sizes known at translation time - called semi-static.

Sol'n: Piece of cake - each activation record for a given procedure is identical in size and location of info. Pascal.

Size known at unit activation (semi-dynamic).

E.g. Array bounds depend on parameters -

E.g. x : Array [m..n] of integer;

often called semi-dynamic

Sol'n: Space for vble descriptors allocated at fixed offset.

Location of local vbles and parameters may have to be calculated from this info at each invocation (e.g., store starting location on stack and use indirection).

Size can vary at any time - Dynamic variables.

E.g. Flexible arrays, pointer data - new, dispose can be invoked at any time.

Sol'n: Separate data area called heap is required.

Lifetime of data independent of lifetime of calling unit.

Allocates and deallocates new space when necessary.

Complicated as size of blocks needed varies.

Requires careful handling to preserve space.

Stack and heap space do not overlap during execution.

Come back and talk about management of heap later!

(Notice difference btn 1, 2, & 3 is binding time!)

Dynamic Language - Dynamic Scoping & Typing

Implementation of dynamic types:

Keep type descriptor of each variable available at run-time

Since type can change dynamically, so can size and contents of descriptor (e.g. # dim's and bounds).

Activation record contains ptr to descriptor which contains ptr to vble.

All accesses provide for run-time check on type - slow.

Implementation of dynamic scope:

Static link now unnecessary, find closest activation record w/name.

Name of vble must be stored in activation record (w/ ptr to descriptor).

Example:

	program A
		var B:integer
		procedure C;
			begin
				..... B ....
			end;
		procedure D(B:integer);
			begin
				C;
			end;
		begin
			D(12);
		end.

Trace stack during execution.

Costs: More space, slower access.

Gains: Flexibility.

Possible to implement by keeping table of loc'ns of active variables.

Overhead when entering and leaving procedures.

Dynamic Memory Management

(See section 10.8 in text)

We cannot use a stack-based discipline for function calls in a functional languagebecause of difficulties in returning functions as values from other functions.

As a result, activation records must be allocated from a heap. Similar difficulties in passing around closures result in most object-oriented languages relying on heap allocated memory for objects. Because it is often not clear when memory can be safely freed, such languages usually rely on an automatic mechanism for recycling memory.

In this lecture we discuss methods for automatically managing and reclaiming free space. We being with the simpler task of managing free space.

Memory management in the heap

A heap is usually maintained as a list or stack of blocks of memory. Initially all of the free space is maintained as one large block, but requests (whether explicit or implicit) for storage and the subsequent recycling of blocks of memory will eventually result in the heap being broken down into smaller pieces.

When a request is made (e.g., via a "new" statement) for a block of memory, some strategy will be undertaken to allocate a block of memory of the desired size. For instance one might search on the list of free space for the first block which is at least as large as the block desired or one might look for a "best fit" in the sense of finding a block which is as small as possible, yet large enough to satisfy the need.

Whichever technique is chosen, only enough memory as is needed will be allocated, with the remainder of the block returned to the stack of available space.

Unless action is taken, the heap will eventually be composed of smaller and smaller blocks of memory. In order to prevent this, the operating system will normally attempt to merge or coalesce adjacent blocks of free memory. Thus whenever a block of memory is ready to be returned to the heap, the adjacent memory locations are examined to determine whether they are already on the heap of available space. If either (or both) are, then they are merged with the new block and put in the heap.

Even with coalescing, the heap can still become fragmented, with lots of small blocks of memory being used alternating with small blocks in the heap of available space.

This can be fixed by occasionally compacting memory by moving all blocks in use to one end of memory and then coalescing all the remaining space into one large block. This can be very complex since pointers in all data structures in use must be updated. (The Macintosh requires the use of handles in order to accomplish this!)

CS 334 Lecture 10

Contents: