Wednesday, August 19, 2009

Loopiness and Stringiness

Actually, there will be no strings here; right now I'm just going to be talking about loops. Earlier I mentioned that the simple algorithm for infering types couldn't handle loops (amongst other things), so let's see why they are an issue more clearly, and try to solve it.

I gave a simple example last time:

main()
{
  var i = "Hello!";
  for(...)
  {
    i.SomeStringMethod();
    i = new Foo();
  }
  i.SomeFooMethod();
}

The problem is that the call to SomeStringMethod (which is presumably valid when i is a string) is valid if the loop runs only once, but isn't if the loop runs more than once. So, we'll assume that the loop runs more than once; now how do we build this assumption into the algorithm?

At first glance we might simply analyse the body of the loop twice; the first time when we check the call to SomeStringMethod i is known only to have type String[], and so the call succeeds. The second time i's typeset contains both String[] and Foo[...], and so the call fails (assuming that Foo doesn't implement SomeStringMethod, which is rather the point of that name).

That sounds good, and it's simple, right? Unfortunately, it's not right, either. Consider this more convoluted piece of code:

main()
{
  var a = new SomeClass();
  var b = new SomeOtherClass();
  var c = new YetAnotherClass();
  
  for(...)
  {
    a = b;
    b = c;
    c = a;
  }
  a.SomeMethod();
}

It's pretty obvious what's going on here: depending on how many times the loop runs, this call could be valid, or it could be invalid. Let's say that both SomeClass and SomeOtherClass implement SomeMethod, but YetAnotherClass does not. Then after 0 iteration the call would succeed, and after 1 it would succeed, but after 2 it would fail. So, let's analyse the loop body 3 times; 3 is enough for anybody, right?

But wait, you say: we could keep adding variables (d, e, and so on) to cause a greater number of iterations to succeed before failure. Right you are. Instead of hardcoding the number of times we should check the body of the loop, we need a way to determine that during analysis.

One way we could approach this is by analysing the code in a separate pass and calculating the number of times, perhaps be checking the number of variables. That would work in the above case, but what about in more complicated cases? Trust me, there are cases in which that sort of simple analysis won't work. Maybe there's a more complicated analysis? I don't know it.

Another way, and indeed the way I will choose, it to iterate to a fixed point. That is we keep on analysing the body of the loop until no typeset changes; at that point every typeset should be complete. Fixed point analyses are really common in Computer Science, and you seem to bump into them quite frequently when working with compilers; for example, the first time I ran into it (I think) was when I was implementing a compiler for Tiger (a simple imperative language) based on Andrew Appel's book Modern Compiler Implementation in ML, which is, incidentally, also the first time I programmed in Standard ML. The specific usage was during liveness analysis, solving dataflow equations.

Anyway, to see how this works, let's run through the last example.

At the outset, each typeset (for a, b, and c) has a single element in it. After the first iteration, a has SomeClass[] and SomeOtherClass[], b has SomeOtherClass[] and YetAnotherClass[], and c has YetAnotherClass[] and SomeClass.

After the second iteration, each typeset has all three classes; we could stop at this point, but how would we know to in the general case? Instead we continue.

After the third iteration, each typeset still has all three classes; no typeset has changed. Now we can stop; we know that further iterations will not yield additional information.

There are a number of questions that remain to be answered. First, how do we know that we'll eventually stop adding concrete types to the typesets, and thereby terminate. Clearly there are an infinite number of concrete types, so we could concievably continue to add types indefinitely.

Second, how do we know that the collection of types in the typesets is accurate after we reach a fixed point? With the simple algorithm I didn't talk about correctness because it was obviously correct (for the ridiculously limited subset of Mix to which it applied), but now correctness is less obvious.

Stay with me, and I'll get to answering these questions (or at least trying to) soon enough.

No comments:

Post a Comment