The last couple of posts have been a bit off topic (where topic = Mix), so let's get back to it. These words are about methods in Mix, how they are implemented, and how the implementation choice corresponds closely to the language's semantics (which seems like a bad idea, but you'll see what I mean).
So, first let me note that, in the spirit of object orientationification, in Mix the goal is that every first class entity is an object. By this I mean that numbers and strings are objects, class instances are objects, instance methods are objects, and functions in general are objects. These are all first class entities, so they're objects; since classes are not first class (you can't go passing around classes, unfortunately, at least at this point) they aren't objects, and in fact they don't have any runtime representation.
It's easy to see how regular class instances are objects (it's by definition,
right?), but perhaps slightly less clear what I mean when I state that, say,
a function is really an object. All I mean is that a function is an instance
of a class that has a method op()
. This gets us into a bit of a
loop; if methods are objects, then this method is an object as well.
So, at the bottom we need to have some notion of a callable object; a
"special" object with no methods at all. From the programmers perspective these
can be ignored, because a callable object can always be treated as an object with an
op()
method that is callable.
So, methods are really just function objects; sounds great, very symmetric.
Now, we have a question to answer: do these function objects take
this
as an argument, or do they contain a reference to
this
that is bound once and for all when the function object is
instantiated? It turns out that this choice will dictate the semantics and
correctness of certain kinds of statements, and so we should try to make a
reasonable and (hopefully) reasoned choice.
Consider the following Mix code, which defines a class implementing
ToString
, and then derives from this a class that does not
override ToString
. Finally, it adds a ToString
method
to an instance of the derived class. Assume that print
just
takes a string and writes it to standard output.
class Base { ToString(){ return "Base!"; } } class C : Base { public var Name; new(n) { this.Name = n; } } Int main() { var c = new C("Bob C."); var f = c.ToString; // (1) c.ToString = function(self) { return self.Name + " " + f(); }; // (2) c.ToString = function() { return c.Name + " " + f(); }; // (3) print(c.ToString()); // (4) return 0; }
In this code there are a few issues. First, at point 1 we store a
reference to the original value of c.ToString
(which is more or
less B
's implementation of ToString
). I think it is
clear that we want f
to contain a function object that takes 0
arguments (so that later we can write f()
and get the same result
as if we wrote c.ToString()
); this implies that the function object
implementing c.ToString
has c
(or this
)
bound as a member variable, and does not take it as an argument.
Then, at point 4, we use the new method in the same way as we used the
original one; we aren't obligated to pass c
to the new
ToString
method.
Next, check out points 2 and 3, which highlight the difference
between the two treatments. In the first case we get explicit access to
this
(though without some syntactic sugar we can't name it the
keyword this
, and so give it a different name). In the second
we only have access to this
(actually c
) because it is
contained in an outer scope. While at first glance they seem equivalent,
they aren't quite the same.
To see the difference, consider this function that, given a function object
(which should correspond to a ToString
method), returns a new
method that annotates the output of the given method with the operated-on
object's Name
field:
function annotate(f) { return function(self) { return self.Name + "(" + f() + ")"; }; } o.ToString = annotate(o.ToString);
While this works under the interpretation higlighted by point 2 above, it would not work under the interpretation shown in point 3:
function annotate(f) { return function() { return X.Name + "(" + f() + ")"; }; }
The question is, "What is X?" One answer is that we can work around the issue
by passing this
(or at least, the object that correspond to it) to
the scope enclosing the new method's definition:
function annotate(o, f) { return function() { return o.Name + "(" + f() + ")"; }; } o.ToString = annotate(o, o.ToString);
So we don't gain expressivity going the second route, though we do perhaps lose code clarity. And in fact, this seems to be exactly how it works in Python:
class Foo: __init__(self, i): self.I = i Get(self): return self.i f = Foo(7) f.Get() # = 7 g = g.Get f.Get = lambda: g() + 1 f.Get() # = 8 f.Get = lambda self: self.i f.Get() # Error!
Anyways, if they are kind of the same, but kind of not, then where does the real difference reside?
One important point is that the first route allows us to use a single function
object instance as a method on many instances regardless of whether the function
object needs to access this
, while the second option only lets us
reuse a function object if the object does not access this
. That
is, under the first interpretation many instances could share a single function
object as a method.
However, a caveat: the "workaround" described for the second situation has a small "strangeness", in that because the variable holding the equivalent of this
is captured it could be updated, but only in by the directly enclosing scope.
All of this leads me to a decision.
- Since at the outset a new function object instance is created whenever the containing class is instantiated, it seems more symmetric to use the second route (which would in general encourage always creating a new function object instance when setting an object's methods).
- Since method use doesn't require
this
to be passed explicitly, and since method definition within a class also doesn't require it, then method definition from outside of a class should not as well.
There is also a second, simpler issue here as well. In point 2
notice that we really do have access to this
; does this mean that
under the first interpretation we should at this point have access to the
private members of this
, and therefore the private members
of c
? I am inclined to believe that we should not: private should
have a lexical meaning, which is that only code lexically within the class, or
lexically within a derived class, whould have access to said members. What's,
more, this is in keeping with the choice made above (that is, as we've picked
the second interpretation, we don't even face such an issue).
Furthermore, it would produce a asymmetry between function objects intended to become methods,
and those intended to be simple function objects, which could only be solved with
more syntax. Namely, it would allow a function object's body to access the
private members of an argument only when the function object was intended
to become a method; we would need to require that in such cases the argument be
made explicitly this
. Anyway, lot's of special casing for an idiom that
could lead to some seriously obfuscated code.