Type Systems and Programming Languages

Kim Bruce
Williams College

Types and type systems have played a prominent role in the design of new and more robust programming language designs over the last quarter century. The introduction of language support for abstract data types and for generic or polymorphic functions, procedures, and modules, based on parameterized types, has made it easier to design programs which are secure, yet whose components are more likely to be reusable in other programs.

Extensions of traditional type systems to second-order constructs has been at the heart of much of this progress. Language support for abstract data types of the sort found in CLU, Modula-2, and Ada consists of a mechanism for encapsulating declarations of types and operations on them, while providing an abstraction barrier for hiding the implementation of these operations. These abstract data types can be modeled as (second-order) existential types. For example an ADT defining an integer stack along with push, pop, and top operations has an interface which can be modeled as the type

IntStackADT = 
   Exists stack. (stack x int -> unit) x (stack -> stack) x (stack -> int)

The idea is that there are many implementations which can be packed up with an interface which looks like this. Because the user only depends on the type (or public interface), the programmer may replace one implementation (e.g., array based) of the existential type with any other implementation (e.g., pointer based), as long as the public interfaces are the same. (Of course, we will want the specifications of the operations to be equivalent as well, but we will ignore that for the purposes of this discussion.)

Parameterized types give rise to polymorphic functions, procedures, and abstract data types. Thus if Array of T is a type for each T, then we may write a polymorphic function

   map(T,U:Type; f: T -> U; A :Array of T): Array of U

which applies the function f to all elements of array, A. The type of map would be

   Forall T. Forall U.(T -> U) x (Array of T) -> Array of U

Similarly, the above IntStackADT can be parameterized over the type of elements of the stack, giving rise to

PolyStackADT = 
   Forall T. Exists stack. (stack x T -> unit) x (stack -> stack) x (stack -> T)

This stack can be instantiated to any particular type T, so that if ArrayStack: PolyStackADT, then, for example, ArrayStack[int]: IntStackADT.

More elaborate constructs involve restricting both universal and existential quantifiers to guarantee that certain operations exist on the parameterized types. For instance a binary search tree ADT should only be instantiated on types which support ordering relations.

All of these constructs make it easier to develop, modify, and reuse code. Abstract data types are easily reused in different programs, because both the types and the operations are designed in a single package which can easily be imported where it is needed. One can also easily replace one implementation of an ADT by another because importing programs cannot take advantage of hidden details of the implementation. Similarly a polymorphic operation can be used in a variety of situations by instantiating it with appropriate types. These polymorphic operations are almost always used in connection with appropriate parameterized types.

Corresponding to the introduction of these more flexible types in real programming languages, there has been much work on the understanding of these new type constructs by theorists. Much interesting work over the last decade has resulted in the creation of formal type systems and semantics for various (usually toy) programming languages. Once the formal type systems and semantics have been written down, one may then attempt to prove "safety" theorems showing that the type system both describes computations accurately and that type errors cannot occur in computations of well-typed programs. For instance subject-reduction theorems show that in a computation, the end result of a computation has the same type as the expression before the computation. Similar type safety theorems show that computations of well-typed programs never get stuck due to type errors.

Concerns:

Ever since the early 1960's, language designers have presented the syntax for languages by giving a formal description via a context free grammar (or equivalently syntax diagrams). Given that similar formalisms for type-checking rules have been well-known for many years, why is it that language definition documents and reference manuals of languages rarely include formal type-checking rules? There may be a perception that these rules are obvious, but, particularly in object-oriented languages, the type-checking rules are often subtle, and several mutually inconsistent type systems may be implemented in compilers.
Type-checking rules are more complex than the context-free grammars for languages, but they are not as difficult as a formal semantics (nor are there as many choices for ways of describing type-checking rules as there are for formal semantics). Moreover, readers can be taught how to read these rules in a relatively short period. Isn't it time to expect all language definitions to include the provision of formal type-checking rules?
Over the last ten years, researchers have obtained a fairly complete understanding of both the semantics and sound type rules for languages containing subtyping, in particular for object-oriented languages. While early research discussed only small parts of these languages, more recent work allows the modeling of larger and larger parts of these languages. This research has uncovered many type problems in existing object-oriented languages.
This has resulted in several proposals for new type systems in these languages, many of which are accompanied by proofs of type safety. In the next few years, language designers and researchers need to evaluate these new type systems in order to uncover their strengths and weaknesses, particularly in terms of making it possible for programmers to more easily express their thoughts. Then one or more industrial-strength languages can be developed which take advantage of the best of these type systems. It is not unreasonable to expect a language to be both expressive and provably type safe.
While object-oriented languages have become quite popular, and their semantics are now reasonably well understood, there remain some important type-checking problems which remain unsolved. One of the more difficult of these has to do with how to handle so-called "binary methods". Binary functions, procedures and relations which take two parameters of the same type are quite common in imperative and functional languages. They include arithmetic operators and comparison relations as well as less obvious operations such as procedures which adds one node to the right of another in linked structures.
Because object-oriented languages typically model these operations by taking one of the arguments to be the receiver of a message (this is an over-simplification, of course), binary methods are defined to be those methods with one or more parameters of the same type as the receiver. Thus an "equal" method with a single parameter with the same type as the receiver is considered to be a binary method.
If subclasses are designed so that the type of the parameter of a binary method from the superclass changes to match the type of the subclass, the type of objects from the subclass will fail to be a subtype of the type of objects from the superclass. The paper "On Binary Methods" by Bruce et al. (to appear in Theory and Practice of Object Systems) describes the problem in more detail and examines several approaches to the problem. However all of these approaches have weaknesses. An important problem is to design an expressive and safe type system for object-oriented languages which overcomes the difficulties with binary methods.
Module systems for programming languages provide support for (among other things) name-space control, information hiding, and separate compilation. While great progress has been made in the design of such systems, languages like ML which have fairly expressive systems for modules (including higher-order modules) have had difficulty in supporting the desired amount of information hiding while retaining separate compilation.
While current research in ML appears likely to solve many of these problems, little work has been done in this area for object-oriented languages. While originally it was assumed that classes could take the place of modules in object-oriented languages, it has become clearer over the years that modules are also necessary, in particular to provide finer control over information hiding. While imperative and functional languages often find it useful to support opaque types (those for which no information is exported), these types are essentially useless in object-oriented languages since objects are expected to carry their own operations with them. Hence exported types need to reveal the names and types of their publicly visible operations.
Because of the conflicting claims of information hiding (so that components can be replaced or modified without affecting clients) and inheritance (so that one implementation can take advantage of another), it may be necessary for module bodies to have more than one interface - one for clients and another for inheritors. To allow flexibility of mixing and matching implementations with interfaces, the attachment of interfaces to implementations might best be done through a "programming-in-the-large" sublanguage which might also allow renaming of components. While the basic ideas are in place, there are many details to be worked out to find a useful solution which provides support for the competing goals.
Because of the growing importance of distributed programming, we might need to rethink some of the ideas in type theory that were appropriate for languages designed to be run on a single processor. For example, notions of type equivalence based on the name of the type may have to be replaced by structural notions of type equivalence unless some way is found to allow type names to be distributed. On the other hand, this may get in the way of information hiding, which is also an important goal.
A solution may involve providing for partial revelations of type interfaces which allow multiple types to satisfy a particular interface. In object-oriented languages subtyping can provide this flexibility. However, in the presence of binary methods, subtyping becomes essentially useless. We speculate that partial revelations based on matching rather than subtyping may provide a better solution to this problem.
Warning: Heresy ahead! There has much concern in the types community with the problem of type inference. However as the type systems become richer, the complexity of these systems can become very high, and the problem is known to be undecidable in some cases. Moreover, once subtyping is introduced, it is not clear whether the results returned by type inference will actually be useful to a programmer, as principal types are no longer available.
As a result, it is worth asking whether full type inference for programs is a worth-while goal. Most languages instead require programmers to over-specify the types of language constructs. For instance, there is no reason for the compiler in Pascal to require the programmer to write down the return type of a user-defined function. The type-checker could just as easily deduce the return type. However, most languages require this sort of redundancy both to catch programmer errors and to provide better documentation to readers of the program.
The functional languages ML, Miranda, and Haskell all come equipped with very powerful type-inference systems which are the result of very impressive research in types. But what would be lost if these languages required the return and parameter types of functions to be specified? In most cases, the answer is: only a few keystrokes. In the case of polymorphic functions, a stronger case could be made for type inference, as type parameters can make function definitions and applications appear more complex. However, even in these cases it may be reasonable to expect the programmer to specify the type parameters in function definitions, while omitting the type parameters in applications and instead merely indicating where the parameter would go.
Would ML still retain its essence without type inference? I believe so. Perhaps it is time to investigate what parts of type inference are truly essential, and which are simply a handy short-cut for the programmer.

Type Systems and Programming Languages

Kim BruceWilliams College

Kim Bruce
Williams College