CS 334
Programming Languages
Spring 2002

Lecture 10

Click here to get an example of how recursion works in the environment-based interpreter. It will make the most sense if you have a copy of the evaluation rules in front of you when you read it.

Type compatibility

Problems arose in the language definition of Pascal having to do with type equivalence and compatibility.

Assignment compatibility:

Original report said both sides must have identical types.

When are types identical?

Ex.:

    Type    T = Array [1..10] of Integer;
    Var  A, B : Array [1..10] of Integer;
             C : Array [1..10] of Integer;
             D : T;
             E : T;
Which variables have the same type?

Name EquivalenceA

Same type iff have same name --> D, E only

Name Equivalence

Same type iff have same name or declared together

--> A, B and D, E only.

Structural Equivalence

Same type iff have same structure --> all same.

Structural not always easy. Let

  T1 = record a : integer; b : real  end; 
  T2 = record c : integer; d : real  end;
  T3 = record b : real; a : integer  end;
Which are the same?

Worse:

  T = record info : integer; next : ^T  end; 
  U = record info : integer; next : ^V  end; 
  V = record info : integer; next : ^U  end; 

Different languages use different versions of type equivalence:

Two types are assignment compatible iff they

  1. have equivalent types or

  2. one is a subrange of the other, or

  3. both are subranges of same base type.

Things are more complicated in object-oriented languages. Then assignment is OK if type of source is a subtype of the receiver type.

Overloading

Several languages allow overloading of procedures/functions/methods, while other allow overloading of operators. This is simply a syntactic convenience for the programmer, as the overloading goes away in compiled code.

It can be useful if the similar names make it easier for the programmer to remember operation names. For example, overloading the plus symbol for integer and real arithmetic is helpful for scientific users. However, it is important to only use overloading where the semantics of the operators or functions is substantially similar, otherwise it will confuse readers. For example, Java's use of "+" for string concatenation is likely more confusing than helpful, especially when programmers write expressions like: m + n + " are the answers", as the language interprets the first "+" as addition and the second as concatenation. See the text for details on how different languages support overloading. (C++ provides extensive support, but Ada's is the most flexible.)

Overloading mixed with type inference is not a good combination as how is a language (e.g., ML) to know how to interpret the type of

   fun add x y = x + y;
   
ML used to provide an error message if you typed this in, but now it instead interprets "+" as integer plus. Hence the type is int -> int -> int. If you want it to be real addition, you must include type information. The problem is that "+" is NOT a polymorphic operation - applicable to a large number of types using the same code. Instead it is simply providing the same name to DIFFERENT operations.

Type-checking and inference algorithms

Type-checking programming languages is generally pretty straightforward. Type-checking rules can be written down in a way similar to the formal semantics we have been using. Let E be a "static type environment" that provides type information for identifiers. Thus (x : T) in E implies that x has type T in environment E. Type checking a program starts with an empty environment (no identifiers have yet been assigned types), but E is expanded every time an identifier is introduced. I've written some rules below for a typed version of PCF.

     E |- f : T -> U,  E |- x : T
     ----------------------------
             E |- f x : U


         E + (x : T) |- B : U 
     ---------------------------
     E |- fn (x:T) => B : T -> U


             E |- x : T,    if (x : T) in E
These can be seen as a specification of a type-checking algorithm in the same way that our rules for natural semantics were.

In general, type-checking for programming languages is very straightforward and efficient (essentially linear in the size of the term being checked in most cases). A more difficult problem is inferring the types of terms.

ML's type system was designed carefully in order to allow type inference to work. Here are some of the basic rules for type inference:

  1. An identifier should be assigned the same type throughout its scope.

  2. In an "if-then-else" expression, the condition must have type boolean and the "then" and "else" portions must have the same type. The type of the expression is the type of the "then" and "else" portions.

  3. A user-defined function has type 'a -> 'b, where 'a is the type of the function's parameter and 'b is the type of its result.

  4. In a function application of the form f x, there must be types T and U such that f has type T -> U and x has type T, and the application itself has type U.

Here is an example of ML type inference (ignoring issues of pattern matching). Define

   fun map f l = if l = [] then [] else (f (hd l)):: (map f (tl l))
   
which will be internally translated to:
   value map = fn f => fn l => if l = [] then [] else (f (hd l)):: (map f (tl l))
Let's look carefully at the clues obtainable from the definition to determine the type of map.

By rule (3), map is assigned type 'a -> 'b, for 'a and 'b type variables. Thus the type of f is 'a. Because the body of the function (the part after "fn f =>") is also a function, 'b = 'c -> 'd, where the type of l is 'c. Now let's examine the if statement to see what more we can deduce.

Rule (2) above states that for an "if-then-else" expression, then the condition must have type boolean and the "then" and "else" subexpressions must have the same type, and that type is the type of the entire term, 'd. Thus we get:

   {f: 'a, l: 'c} |- l = [] : boolean
   {f: 'a, l: 'c} |- [] : 'd
   {f: 'a, l: 'c} |- (f (hd l)):: (map f (tl l)) : 'd
We can make the following deductions from this.

Let's write down all of the constraints we have derived:

   'c = 'e list
   'd = 'g list
   'a = 'h -> 'g
   'h = 'e
   

We can solve these to get

   'a = 'e -> 'g
   'c = 'e list
   'd = 'g list
   
Thus the type of map is ('e -> 'g) -> ('e list) -> ('g list) where 'e and 'g are any types. We can write this more compactly by writing the type as ∀ 'e. ∀ 'g. ('e -> 'g) -> ('e list) -> ('g list). If your browser doesn't render this properly, the first and third symbols are supposed to be "upside down" A's, standing for "for all".

There are three possibilities when the ML type inferencer tries to solve the system of constraints generated by the analysis of an ML expression:

  1. It is overconstrained. Thus there is no solution, and the expression has a type error. This is the cause of error messages like:

    - tl 7;
    stdIn:22.1-22.5 Error: operator and operand don't agree [literal]
      operator domain: 'Z list
      operand:         int
      in expression:
        tl 7
        
    The error message above results from rule 4 above, and indicates that tl can only be applied to arguments of the form 'Z list, for 'Z a type variable, while the actual argument, 7, is of type int. The system could not solve the constraint 'Z list = int because no substitution for 'Z could make this true.

  2. It is underconstrained. In this case there are many solutions. These solutions can arise in two ways: it may be ambiguous due to overloading (causing a type error or with the system choosing one interpretation of the overloaded operators) or it may be polymorphic, resulting in a type with type variables.

  3. It is uniquely determined. In this case there is a unique solution and the expression has exactly one type.

    The type of a function in ML is not allowed to contain a universal quantifier (∀) on the inside. All of these quantifiers must appear at the outer level. This means that a function may not be defined to take a polymorphic function as an argument, though it can be applied to a specialization of a polymorphic function.

    Related to this is a type restriction that you may run into in your programming. An identifier introduced by a val binding (including the implicit "it") may not be given a polymorphic type unless the expression on the right side of the binding is a "value".

    These values are similar to the ones used in this week's PCF interpreter homework. A value is something that cannot be further evaluated. Thus constants, lists, and function definitions are all values, but a function application is not a value.

    For example, look at the following transcript of an ML session:

    - fun double f x = f(f x);
    val double = fn : ('a -> 'a) -> 'a -> 'a
    - tl;
    val it = fn : 'a list -> 'a list
    - val dbleTl = double tl;
    stdIn:11.1-11.10 Warning: type vars not generalized because of
       value restriction are instantiated to dummy types (X1,X2,...)
    val dbleTl = fn : ?.X1 list -> ?.X1 list
    - dbleTl [1,2,3];
    stdIn:12.1-12.11 Error: operator and operand don't agree [literal]
      operator domain: ?.X1 list
      operand:         int list
      in expression:
        dbleTl (1 :: 2 :: 3 :: nil)
    - double tl [1,2,3];
    val it = [3] : int list
    
    double is defined as a polymorphic curried function that applies the first argument twice to the second argument. tl is a predefined polymorphic function returning the tail of a list. Applying double to tl results in an error, because the result is polymorphic, but the expression double tl is not a value. Hence the type warning shown above. Oddly, ML only prints the warning, but the resulting value is not usable as shown by applying dbleTl to [1,2,3]. On the other hand, writing double tl [1,2,3] causes no problems because the result is not polymorphic!

    I don't want to go into detail into the reason for the "value restriction" as it is intended to avoid a problem with polymorphic references (variables that can hold polymorphic values), but I wanted you to see this in order to recognize what is going on when you see such an error. Various versions of ML have had different restrictions on typing in order to avoid problems with polymorphic references, but this one seems not to cause many problems in practice. In fact, the above problem with dbleTl can be solved by writing the definition as:

    - fun dbleTl x = double tl x;
    val dbleTl = fn : 'a list -> 'a list
    
    A moment's thought will show you that this defines the same function as before. However, because it is now given as a function definition (functions are always "values" in the technical sense above), rather than a val definition, means that it is not subject to the value restriction.

    Typing limitations in Pascal

    Variant types are unsafe

    No semi-dynamic arrays. Result of 2 principles:

    1. All types must be determinable at compile time.

    2. Array bounds are part of type.

    Therefore, must have statically determinable array bounds.

    Type of actual parameters must agree w/ type of formals

    Therefore, no general sort routines, etc.

    The major problem with Pascal

    Ada

    Ada's Types

    Built-In:

    Integer, Real, Boolean, Char, strings.

    Enumeration types.

    Character and boolean are predefined enumeration types.

    e.g., type Boolean is (False, True)

    Can overload values:

        Color is (Red, Blue, Green)
        Mood is (Happy, Blue, Mellow)
    
    If ambiguous can qualify w/ type names:
        Color(Blue), Mood(Blue)
    

    Subranges

    Declared w/range attribute.

    i.e., Hex is range 0..15

    Other attributes available to modify type definitions:

    	Accurate is digits 20
    	Money is delta 0.01 range 0.00 .. 1000.00     -- fixed pt!
    
    Can extract type attributes:
    	Hex'FIRST -> 1
    	Hex'LAST  -> 15
    
    Can initialize variables in declaration:
    	declare k : integer := 0
    

    Arrays

    "Constrained" - semi-static like Pascal
    	type Two_D is array (1..10, 'a'..'z') of Real 
    or "Unconstrained" (what we called semi-dynamic earlier)
    	type Real_Vec is array (INTEGER range <>) of REAL;
    Generalization of open array parameters of MODULA-2.

    Of course, to use, must specify bounds,

    	declare x : Real_Vec (1..10)
    or, inside procedure:
       Procedure sort (Y: in out Real_Vec; N: integer) is -- Y is open array parameter
          Temp1 : Real_Vec(1..N);             -- depends on N
          Temp2 : Real_Vec (Y'FIRST..Y'LAST); -- depends on parameter Y
          begin 
             for I in Y'FIRST ..Y'LAST loop
                ...
             end loop;
             ... 
          end sort;
    
    Note Ada also has local blocks (like ALGOL 60)

    All unconstrained types (w/ parameters) elaborated at block entry (semi-dynamic)

    String type is predefined open array of chars:

    	array (POSITIVE range <>) of character;

    Can take slice of 1-dim'l array.

    E.g., if

        Line : string(1..80)
    Then can write
        Line(10..20) := ('a','b',.'c','d','e','f','g','h','i','j')  
                                             -- gives assignment to slice
    
    Because of this structure assignment, can have constant arrays.

    Ada Subtypes and derived types:

    Types have static properties - checked at compile time

    and dynamic properties - checked at run time

    Example of dynamic are range, subscript, etc.

    Specify dynamic properties by defining subtype. E.g.,

       subtype digit is integer range 0..9;
    Subtypes also constrain parameterized array or variant record.
    	subtype short_vec is Real_Vec(1..3);
    	subtype square_type is geometric (square)
    Subtypes do not define new type, add dynamic constraints.

    Therefore can mix different subtypes of same type w/ no problems

    Derived types define new types:

    type Hex is new integer 0..15
    type Ounces is new integer 0..15
    Now Hex, Ounces, and Integer are incompatible types: treated as distinct copies of 0..15

    Can convert from one to other:

    Hex(I), Integer(H), Hex(Integer(G))
    Derived types inherit operators and literals from parent type.
    E.g., Hex gets 0,1,2,... +,-,*,...
    Use for private (opaque) types and when don't want mixing.

    Compare Ada's solutions w/ Pascal's problems:

    Helped by removing dynamic features from def of type subrange or index of array.

    Can now have open array parameters (also introduced in ISO Pascal).

    Variants fixed

    Name equivalence in Ada to prevent mixing of different types. E.g., can't add Hex and Ounce.

    Can define overloaded multiplication such that if

    	l:Length;
    	w:Width;
    then l * w : Area.

    Back to:

  4. CS 334 home page
  5. Kim Bruce's home page
  6. CS Department home page
  7. kim@cs.williams.edu