Friday, October 23, 2009

Methods Methodical

The last couple of posts have been a bit off topic (where topic = Mix), so let's get back to it. These words are about methods in Mix, how they are implemented, and how the implementation choice corresponds closely to the language's semantics (which seems like a bad idea, but you'll see what I mean).

So, first let me note that, in the spirit of object orientationification, in Mix the goal is that every first class entity is an object. By this I mean that numbers and strings are objects, class instances are objects, instance methods are objects, and functions in general are objects. These are all first class entities, so they're objects; since classes are not first class (you can't go passing around classes, unfortunately, at least at this point) they aren't objects, and in fact they don't have any runtime representation.

It's easy to see how regular class instances are objects (it's by definition, right?), but perhaps slightly less clear what I mean when I state that, say, a function is really an object. All I mean is that a function is an instance of a class that has a method op(). This gets us into a bit of a loop; if methods are objects, then this method is an object as well. So, at the bottom we need to have some notion of a callable object; a "special" object with no methods at all. From the programmers perspective these can be ignored, because a callable object can always be treated as an object with an op() method that is callable.

So, methods are really just function objects; sounds great, very symmetric. Now, we have a question to answer: do these function objects take this as an argument, or do they contain a reference to this that is bound once and for all when the function object is instantiated? It turns out that this choice will dictate the semantics and correctness of certain kinds of statements, and so we should try to make a reasonable and (hopefully) reasoned choice.

Consider the following Mix code, which defines a class implementing ToString, and then derives from this a class that does not override ToString. Finally, it adds a ToString method to an instance of the derived class. Assume that print just takes a string and writes it to standard output.

class Base
{
  ToString(){ return "Base!"; }
}

class C : Base
{
  public var Name;
  new(n) { this.Name = n; }
}

Int main()
{
  var c = new C("Bob C.");
  var f = c.ToString;                  // (1)
  
  c.ToString = function(self)
    { return self.Name + " " + f(); }; // (2)
  c.ToString = function()
    { return c.Name + " " + f(); };    // (3)
  
  print(c.ToString());                 // (4)
  
  return 0;
}

In this code there are a few issues. First, at point 1 we store a reference to the original value of c.ToString (which is more or less B's implementation of ToString). I think it is clear that we want f to contain a function object that takes 0 arguments (so that later we can write f() and get the same result as if we wrote c.ToString()); this implies that the function object implementing c.ToString has c (or this) bound as a member variable, and does not take it as an argument.

Then, at point 4, we use the new method in the same way as we used the original one; we aren't obligated to pass c to the new ToString method.

Next, check out points 2 and 3, which highlight the difference between the two treatments. In the first case we get explicit access to this (though without some syntactic sugar we can't name it the keyword this, and so give it a different name). In the second we only have access to this (actually c) because it is contained in an outer scope. While at first glance they seem equivalent, they aren't quite the same.

To see the difference, consider this function that, given a function object (which should correspond to a ToString method), returns a new method that annotates the output of the given method with the operated-on object's Name field:

function annotate(f)
{
  return function(self)
    { return self.Name + "(" + f() + ")"; };
}
o.ToString = annotate(o.ToString);

While this works under the interpretation higlighted by point 2 above, it would not work under the interpretation shown in point 3:

function annotate(f)
{
  return function() { return X.Name + "(" + f() + ")"; };
}

The question is, "What is X?" One answer is that we can work around the issue by passing this (or at least, the object that correspond to it) to the scope enclosing the new method's definition:

function annotate(o, f)
{
  return function() { return o.Name + "(" + f() + ")"; };
}
o.ToString = annotate(o, o.ToString);

So we don't gain expressivity going the second route, though we do perhaps lose code clarity. And in fact, this seems to be exactly how it works in Python:

class Foo:
  __init__(self, i):
    self.I = i
  
  Get(self):
    return self.i
    
f = Foo(7)
f.Get()  # = 7

g = g.Get
f.Get = lambda: g() + 1
f.Get()  # = 8

f.Get = lambda self: self.i
f.Get()  # Error!

Anyways, if they are kind of the same, but kind of not, then where does the real difference reside?

One important point is that the first route allows us to use a single function object instance as a method on many instances regardless of whether the function object needs to access this, while the second option only lets us reuse a function object if the object does not access this. That is, under the first interpretation many instances could share a single function object as a method.

However, a caveat: the "workaround" described for the second situation has a small "strangeness", in that because the variable holding the equivalent of this is captured it could be updated, but only in by the directly enclosing scope.

All of this leads me to a decision.

  • Since at the outset a new function object instance is created whenever the containing class is instantiated, it seems more symmetric to use the second route (which would in general encourage always creating a new function object instance when setting an object's methods).
  • Since method use doesn't require this to be passed explicitly, and since method definition within a class also doesn't require it, then method definition from outside of a class should not as well.

There is also a second, simpler issue here as well. In point 2 notice that we really do have access to this; does this mean that under the first interpretation we should at this point have access to the private members of this, and therefore the private members of c? I am inclined to believe that we should not: private should have a lexical meaning, which is that only code lexically within the class, or lexically within a derived class, whould have access to said members. What's, more, this is in keeping with the choice made above (that is, as we've picked the second interpretation, we don't even face such an issue).

Furthermore, it would produce a asymmetry between function objects intended to become methods, and those intended to be simple function objects, which could only be solved with more syntax. Namely, it would allow a function object's body to access the private members of an argument only when the function object was intended to become a method; we would need to require that in such cases the argument be made explicitly this. Anyway, lot's of special casing for an idiom that could lead to some seriously obfuscated code.

No comments:

Post a Comment