Friday, September 11, 2009

Been There

So, I'm not the first idiot to think about doing type inference in object-oriented languages, obviously. There's plenty of prior work in this area. Here's a list of some relevant research, by project.

  • O'Caml: that's what the O stands for, maybe? Anyway, it seems like O'Caml's object layer doesn't get much love, but the type system behind it is pretty interesting. It implements structural typing via row variables. "The Essence of Type Inference in ML", a chapter of Advanced Topics in Types and Programming Languages, is a very readable introduction to this (and indeed many interesting aspects of type inference in ML-like languages), written by Francois Pottier and Didier Remy.
  • Self: there's been a lot of work on type inference for Self, generally it seems for the purpose of efficient implementation. Ole Agesen's work on the Cartesian Product Algorithm (CPA) has been the basis for a number of different lines of research; for instance, Olin
  • Python: I haven't heard much on this front recently, but Mike Salib's thesis describes an application of the Cartesian Product Algorithm to Python which is pretty neat, though the subset of Python it is implemented for is a bit smaller than some might like.
  • Java: An extension of the CPA for dealing with data polymorphism (more on what that means in a second) called the Data-polymorphic CPA has been applied to Java in the context of detecting invalid downcasts.
  • More: we'll get to others at some point.

So, why don't I just implement some of this previous work and be done with it? Well, other than the fact that it isn't as fun as trying something new, each of the previously described approaches leaves something to be desired. So, let's first examine what we (ok, what I) want from the inference algorithm and language, and then see which criteria each of the aforementioned approaches meets.

  1. Structural Typing: If an expression implements a particular method, we should be able to call that method (subject to the "correctness" constraints we've already talked about).
  2. Recursive Types and Functions: A class C should be able to contain members that are instances of class C, see? And members should be able to call themselves recursively. Oh, and make that mutually in both cases.
  3. Data polymorphism: Variables should be able to hold objects of more than one type, again subject to the constraints already mentioned. This property is critical to useful data structures (we want lists that hold variables of several types, and that can be accessed in an intelligent way, for example. Or maps from strings to different kinds of entities in a super awesome MMOORRPPGG).

First, row variables a la O'Caml. They handle the structural typing end of things, and recursive types and functions, just fine. But row variables don't say anything about mutable data (and therefore data polymorphism), and O'Caml (like most ML-like languages) doesn't treat mutable data in the way that I wish it would. In O'Caml, a mutable data 'cell' is stored in a reference. When the reference created it is filled with some initial data, which can later be updated. However, the type that the reference holds cannot change; if the reference is first filled with a string, it cannot later be filled with an int.

In addition, the structural typing supported by row variables isn't quite as strong as we might like. It can deal with inferring the types of arguments to a function by the way that they are used, but some simple constructs throw the algorithm for a loop. For instance:

let someFun b =
  if b then
    new Foo
  else
    new Bar

The above will not compile: there is no implicit subsumption rule in O'Caml. The algorithm will not find a subtype of Foo and Bar, and use that in each branch of the if expression. It will only work if you explicitly "cast" to such a type (note that the cast is safe, and not a downcast). This is exactly the kind of code we will be writing (and be expecting to work!), so row variables as implemented in O'Caml aren't going to cut it.

I was going to continue with the rest of my bullets, but I've run out of ammo (and time), so you'll have to wait until next time.

No comments:

Post a Comment