Last time I said that my notion of correctness is simple: you should never invoke a method on an object
that doesn't implement it. To start off discussion, let's look at a very simple algorithm that statically determines
this. The subset of Mix that it handles is one without loops, without recursive calls, and with no inheritance; basically a fairly
useless subset of Mix, but let's go with it.
The idea is simple: starting from the programs entrypoint (a free function I'll call main
), we'll interpret
the program, but instead of using regular values like ints, and strings, and objects, we'll use types; that is, we'll
perform an abstract interpretation. Abstract
interpretation, invented and first formalized by Cousot and Cousot, has a long history in static analysis; I'll be more
or less ignoring the formalization from here on in, and just going with an informal approach. Check
this out for a fairly readable introduction to "real" abstract interpretation.
Moving on to the actual algorithm, consider the following trivial code:
class A
{
public var Member;
public new(arg){ this.Member = 1; };
public Method1(){};
public Method2(){ this.Method1() };
};
main()
{
var a = new A();
a.Method2();
a.Member;
return;
}
So, starting in main
, let's interpret the program. At the outset, we have a variable declaration; let's
pretend that it was really written as a variable declaration followed by an assigment:
var a;
a = new A();
That let's us look at the two steps separately, instead of all at once. So, first we see a variable declaration, so
we add a
to the environment; we associate with a
the empty typeset. A typeset can
be thought of as a set containing all types that the variable is known to hold; by looking at how all of the types
in a typeset intersect, we can understand what operations on a
are guaranteed to be legal.
Next, there's an assignment: the effect of the assignment under the interpretation is that the typeset associated
with a
should be augmented by the right side's value; but how can we determine the right side's value? Let's
continue interpreting. new
(with a particular class name and constructor arguments, of which there are none in
this case) corresponds to constructing a concrete type based on the class and then invoking the class's construction operator.
Concrete types are values that represent the properties of an object, namely it's class name and its fields (both members
and methods). Concrete types are what go into typesets. From now on, we can think of the value of a particular expression
(under the abstract interpretation) as a typeset containing 0 or more concrete types.
So, first we construct the concrete type: it has name A
, and it has 1 member and 2 methods, namely Method1
and
Method2
; for now we'll indicate this concrete type as follows: A[Member: {}; Method1: {function()}; Method2: {function()}]
.
This should be read as a concrete type "A with field 'Member' having the empty typeset, 'Method1' having a typeset of 1 element,
and 'Method2' having an typeset of 1 element." The elements in the typesets for Method1
and Method2
are callable; I'll talk about callables in some other post, but for now the name should give you enough information
to go on.
Now that we've constructed the concrete type, we interpret the constructor. This means we analyze the body of the
new
operator for class A
, using an environment in which this
is bound to a
typeset consisting of only the just-constructed concrete type. So, we look inside the operator's definition, and the first
thing we see is another assignment.
Once again we add the value of the thing on the right to the typeset of whatever's on the left, and this time the
thing on the right is just an integer; for now we'll say that the concrete type of an integer is Int[]
,
therefore the value of the right side is a typeset containing only 1 concrete type. Essentially we're setting the
typeset of the thing on the left to be the union of itself and the typeset of the thing on the right.
So, the result is that the original concrete type we constructed in the previous paragraph is now the following:
A[Member: {Int[]}; Method1: {function()}; Method2: {function()}]
. Finally we return to main;
the important thing to notice now is that we know that the concrete type has a member Member
that at some
point could be instantiated with an Int[]
, and that a
's intersection contains this
concrete type.
Next we come to an method invocation:
...
a.Method2();
...
It actually consists first of a field lookup and then a call.
In particular, we first check to see that all of the types that a
could be have implement Method1
,
and then we check that each of the fields' types are callable. Again, a
can have only 1 type,
which does indeed implement Method1
, and it is callable. So, we interpret the call, by looking
in the definition of Method1
, passing the relevant concrete type in a
's intersection
as this
.
In Method1
we follow a similar process to check that this
implements Method2
,
which it does. We go into the (vacuous) body of Method2
, and eventually come all the way back to
main
.
Next we check that all of the types in a
's intersection have a field named Member
; again they
do, and in this case its type is dropped on the floor (just as the value of a.Member
would be dropped on the
floor at runtime). Finally, we return, and that's that: all of the code as been interpreted, and no invocation failed,
so the code is deemed correct.
That got really long winded, and we didn't even seem to do much: there were no errors, and a
only ever
had 1 type in its typeset, so it wasn't too interesting. So, let's briefly look at a slightly more interesting
example; we'll ignore some aspects (like where rand
comes from) to keep it simple.
class C
{
public new(){};
public Foo(){ if(rand() == 0) return 1; else return new A(); };
}
main()
{
var c = new C();
var i = c.Foo();
i.ToString();
}
Interpreting main
we see that, after the call to c.Foo()
, i
's typeset
should have 2 concrete types, one corresponding to an integer, and the other to an instance of A
.
When we check to see if all of these types have a field named ToString
in the last line of main
we encounter an error: C
doesn't implement ToString
.
Hopefully that was pretty clear, and also pretty obvious. You might recall that I pretty severely limited the language
at the outset, and now you might be able to guess why: if we allow recursion with this simple algorithm, it will go into
an infinite loop, following recursive invocations over and over. If we allow loops, the problem is only slightly more
subtle. Consider the following code:
main()
{
var i = "Hello!";
for(...)
{
i.ToString();
i = new A();
}
}
Our simple algorithm, with no modification, would not see an error here, though there clearly is one if the loop runs
more than once. It will check the invocation i.ToString()
when i
's typeset contains only
Int[]
, and not when it contains also A[...]
.
So, lot's of new stuff (not like there was much beforehand). I'll talk more about typesets, concrete types, callables,
and I'll start to address both of these algorithmic issues in the next few posts.