Wednesday, December 30, 2009

Points of Interest

Some scenic overlooks on the internet:

OcamlCore Planet
Fabulous Adventures...
Room 101
Andrew Kennedy's Blog

To be continued...

.ctor

I was wandering around the internet looking for interesting programming languages related blogs and came across the blog of Gilad Bracha, one of the people involved with Newspeak. There were many interesting posts, but an older one caught my eye and got me thinking about object instantiation in Mix.

In Constructors Considered Harmful Mr. Bracha expounds on the problems associated with object constructors, when it comes to both the reusability and extensibility of a piece of code, and the uniformity of the language.

One way of thinking about this first point is to consider object construction to be separate from object initialization. In this scenario new is an operation that first constructs an uninitialized object of a particular class, and then calls the constructor to initialize it. In this sense one can see that the signature of the constructor in many object oriented languages is rather arbitrarily constrained, in that it must return an instance of the class that it is defined in. This means, for instance, that new X() must return an instance of X, and can't return an instance of a related type Y. This means that, if ever your code changes so that you need such functionality, you'll have to traipse all around your code base replacing calls to new X(). That's the first problem; how can we

What might be nice would be for us to separate these two concerns explicitly, so that new really does only instantiate an uninitialized object, and then some method on that object is responsible for initializing the object.

But wait, you say: won't this allow us to accidentally create and then use uninitialized instances of a particular class? And isn't that frightening and terrible? Well, sure, as presented thus far it would. Of course, there's nothing that prevents you (save for good practice, of course) from defining all your classes with an empty constructor and a method named Init. But, we can do a bit better (though we'll still trust ourselves not to amputate our own feet via firearm).

Let us say that one can only create an uninitialized instance of a particlar class from within that class, so that a random programmer can't new up an uninitialized instance. Then let's restore a notion of constructor (call it an initializer) that is nothing but a (possibly named) constructor, but one that is obligated to return the object it is responsibly for creating and initializing. Therefore our initializer can return instances of a different type if it so desires.

Aside: This is really just like static factory methods as in Java, and in fact that is one technique for avoiding some of the problems with constructors that has found its way into general practice. However in Java we can still write regular constructors with all of their supposed problems.

But aren't we back to regular old constructors? Not quite, because we are not required to return any particular type from our initializer. So we might have the following code:

class Foo
{
  new Foo(i)
  {
    var t = new Foo;
    t.Bar = null;
    t.Num = i;
    return t;
  }
  
  new FooBar(i)
  {
    var t = new Foo;
    t.Bar = Bar.Bar();
    t.Num = i + 8;
    return t;
  }
}

var f = Foo.Foo(42);
var g = Foo.FooBar(34);

Then, if ever we need to modify the constructor of Foo, so that, for instance, it returns an access controlled instance of a Foo, we can do the following:

class Foo
{
  new Foo(i)
  {
    var t = new Foo;
    t.Bar = null;
    t.Num = i;
    return AccessController.New(t);
  }
  ...
}

Another complaint raised in the aforementioned blog post (the second point I metioned) is that constructors behave differently than other "callable" objects. That is, creating an object using a constructor is fundementally different at the language level (for many languages, like Java, C#, and so on) than calling a function (or static method, or whatever), so that you cannot, in general, pass a constructor around, or store it in a variable; they aren't first class. In the above presentation there is nothing precluding them from being so.

So really, all we have done is introduced some limited form of static methods, and restricted the use of new as an object instantiation device so that it can only be used to create new, uninitialized objects of the same type as the class in which the call to new is found. What we gain is that constructors need not return a particular type, that we still are "encouraged" by the language to write constructors that return initialized objects, and that constructors behave in the same way as regular functions, in that they may be passed around, stored in variables, and so on.

In fact I had never really faced either of the problems that Mr. Bracha described in my own programming; perhaps this is because I haven't worked on any really large projects with many other participants. In general I've never needed a constructor lke that of Foo above, and if I've ever needed to pass a constructor as a function I have just wrapped it, thus:

  var ctor = function(i, j, k){ return new SomeClass(i, j, k); };

But still, it is an interesting point that merits some thought, especially in the context of large codebases that act as a dependency to many other pieces of software, and whose signature cannot easily be changed.

Friday, December 4, 2009

Down, Down, Down

So, back to the stacks. This time it's about functions and methods and objects and functions and methods and objects.

So, as I mentioned only briefly, functions are really objects that implement a particular method, namely operator(). This is nice because it means that we can easily define functors and use them in the same way we use "regular" functions and object methods. For instance, the following class wraps an "action" (for those of you not from the C# world, a unary function returning void) and returns an action that keeps track of how many times it has been called:

class CountedAction
{
  private var func;
  private var count = 0;
  
  public operator()(arg)
  {
    this.count++;
    this.func(arg);
  }

  public Count()
  {
    return this.count;
  }
  
  public Reset()
  {
    this.count = 0;
    return;
  }
}

Now, to be fair there are other ways we could swing this sort of thing, but I'm not as concerned with what is possible so much as I am with what is simple or even elegant.

So, what we'd like to say is that, whenever we call an object, we simply look up that object's operator() and call that. Of course, here be our turtles.

Retraction, start over. Let's say first that a callable object is either a "real" function, or it is an object which has a member operator() that is itself callable. Then calling an object consists of either actually calling it (in terms of typechecking, it means creating a new environment with bindings for each function argument and then checking the function's body) if it is a "real" function, or it proceeds by retrieving the object-to-be-called's member operator() and calling that with the given arguments.

That's a bit better, but it means that the programmer has to deal with both objects and these "real" functions, when the whole point of this mess is to treat everything as objects. So it remains for us to hide this additional complexity in the language, so that the programmer can pretend that everything is an object, and that anything implementing operator() can be called.

It's pretty easy to see that any "real" function can be easily treated as an object that has a single member, operator(), whose body is the exact same as the function. So in the end all we have to do is, whenever a "real" function is used in any way other than being called, promote it to an object.

That's great, and thus topples our turtle stack. But what's the end result for me? Well, one nice thing is that now all functions and methods can be treated uniformly. For instance, consider the following class definition:

class Foo
{
  BarMethod(a)
  {
    return a.Bleck();
  }
}
That's equivalent to the following:
class Foo
{
  private var BarMethod;
  new()
  {
    this.BarMethod = function(a){ return a.Bleck(); };
  }
}

All because methods are just functions; see this post for more information on how methods are implemented in this way. And now I'll say goodnight.