Monday, August 17, 2009

Algorithmics Simplistic

Last time I said that my notion of correctness is simple: you should never invoke a method on an object that doesn't implement it. To start off discussion, let's look at a very simple algorithm that statically determines this. The subset of Mix that it handles is one without loops, without recursive calls, and with no inheritance; basically a fairly useless subset of Mix, but let's go with it.

The idea is simple: starting from the programs entrypoint (a free function I'll call main), we'll interpret the program, but instead of using regular values like ints, and strings, and objects, we'll use types; that is, we'll perform an abstract interpretation. Abstract interpretation, invented and first formalized by Cousot and Cousot, has a long history in static analysis; I'll be more or less ignoring the formalization from here on in, and just going with an informal approach. Check this out for a fairly readable introduction to "real" abstract interpretation.

Moving on to the actual algorithm, consider the following trivial code:

class A
{
  public var Member;
  
  public new(arg){ this.Member = 1; };
  
  public Method1(){};
  public Method2(){ this.Method1() };
};

main()
{
  var a = new A();
  a.Method2();
  a.Member;
  return;
}

So, starting in main, let's interpret the program. At the outset, we have a variable declaration; let's pretend that it was really written as a variable declaration followed by an assigment:

var a;
a = new A();

That let's us look at the two steps separately, instead of all at once. So, first we see a variable declaration, so we add a to the environment; we associate with a the empty typeset. A typeset can be thought of as a set containing all types that the variable is known to hold; by looking at how all of the types in a typeset intersect, we can understand what operations on a are guaranteed to be legal.

Next, there's an assignment: the effect of the assignment under the interpretation is that the typeset associated with a should be augmented by the right side's value; but how can we determine the right side's value? Let's continue interpreting. new (with a particular class name and constructor arguments, of which there are none in this case) corresponds to constructing a concrete type based on the class and then invoking the class's construction operator.

Concrete types are values that represent the properties of an object, namely it's class name and its fields (both members and methods). Concrete types are what go into typesets. From now on, we can think of the value of a particular expression (under the abstract interpretation) as a typeset containing 0 or more concrete types.

So, first we construct the concrete type: it has name A, and it has 1 member and 2 methods, namely Method1 and Method2; for now we'll indicate this concrete type as follows: A[Member: {}; Method1: {function()}; Method2: {function()}]. This should be read as a concrete type "A with field 'Member' having the empty typeset, 'Method1' having a typeset of 1 element, and 'Method2' having an typeset of 1 element." The elements in the typesets for Method1 and Method2 are callable; I'll talk about callables in some other post, but for now the name should give you enough information to go on.

Now that we've constructed the concrete type, we interpret the constructor. This means we analyze the body of the new operator for class A, using an environment in which this is bound to a typeset consisting of only the just-constructed concrete type. So, we look inside the operator's definition, and the first thing we see is another assignment.

Once again we add the value of the thing on the right to the typeset of whatever's on the left, and this time the thing on the right is just an integer; for now we'll say that the concrete type of an integer is Int[], therefore the value of the right side is a typeset containing only 1 concrete type. Essentially we're setting the typeset of the thing on the left to be the union of itself and the typeset of the thing on the right.

So, the result is that the original concrete type we constructed in the previous paragraph is now the following: A[Member: {Int[]}; Method1: {function()}; Method2: {function()}]. Finally we return to main; the important thing to notice now is that we know that the concrete type has a member Member that at some point could be instantiated with an Int[], and that a's intersection contains this concrete type.

Next we come to an method invocation:

  ...
  a.Method2();
  ...

It actually consists first of a field lookup and then a call. In particular, we first check to see that all of the types that a could be have implement Method1, and then we check that each of the fields' types are callable. Again, a can have only 1 type, which does indeed implement Method1, and it is callable. So, we interpret the call, by looking in the definition of Method1, passing the relevant concrete type in a's intersection as this.

In Method1 we follow a similar process to check that this implements Method2, which it does. We go into the (vacuous) body of Method2, and eventually come all the way back to main.

Next we check that all of the types in a's intersection have a field named Member; again they do, and in this case its type is dropped on the floor (just as the value of a.Member would be dropped on the floor at runtime). Finally, we return, and that's that: all of the code as been interpreted, and no invocation failed, so the code is deemed correct.

That got really long winded, and we didn't even seem to do much: there were no errors, and a only ever had 1 type in its typeset, so it wasn't too interesting. So, let's briefly look at a slightly more interesting example; we'll ignore some aspects (like where rand comes from) to keep it simple.

class C
{
  public new(){};
  public Foo(){ if(rand() == 0) return 1; else return new A(); };
}

main()
{
  var c = new C();
  var i = c.Foo();
  i.ToString();
}

Interpreting main we see that, after the call to c.Foo(), i's typeset should have 2 concrete types, one corresponding to an integer, and the other to an instance of A. When we check to see if all of these types have a field named ToString in the last line of main we encounter an error: C doesn't implement ToString.

Hopefully that was pretty clear, and also pretty obvious. You might recall that I pretty severely limited the language at the outset, and now you might be able to guess why: if we allow recursion with this simple algorithm, it will go into an infinite loop, following recursive invocations over and over. If we allow loops, the problem is only slightly more subtle. Consider the following code:


main()
{
  var i = "Hello!";
  for(...)
  {
    i.ToString();
    i = new A();
  }
}

Our simple algorithm, with no modification, would not see an error here, though there clearly is one if the loop runs more than once. It will check the invocation i.ToString() when i's typeset contains only Int[], and not when it contains also A[...].

So, lot's of new stuff (not like there was much beforehand). I'll talk more about typesets, concrete types, callables, and I'll start to address both of these algorithmic issues in the next few posts.

No comments:

Post a Comment