Concrete Mix: mix

Showing posts with label mix. Show all posts

Thursday, April 22, 2010

Again with Immutabililty

So, a while ago I was telling all of my ardent readers that things can get kind of crazy when it comes to the question of using "real methods" for operator+= vs. using syntactic sugar.

Turns out I'm not the only person that's been troubled by this kind of a question; Scala programmers have to worry about it, as do the designers of Scala. Have a look at this post over on Stack Overflow.

Tuesday, March 9, 2010

Innate Immutabilities

A quick note: at some point when implementing the Mix runtime I had occasion to write the Int class. I think I've mentioned abstract classes before; they act as bindings to .NET classes that exist in the runtime, or in user supplied code (in case you want to use your regular .NET code from Mix, and that code can't be automatically imported using the import statement because the types that you want associated with your code are too complicated for Mix to infer).

Anywho I was writing this class and came across a decision I had made a while ago that caused me to scratch my head. In Mix operators are just method calls, which gives an easy to understand form of operator overloading, whereby an operator invocation is just a method invocation on one of the arguments. Which argument is operator dependent, but I think almost all of them use the left argument, passing the right; of course, single argument operators like ++ are easy.

The decision that I made was to make certain operators "real" operators, not just syntactic sugar. For instance, I made operators ++ and -- real operators, and didn't just expand them from i++; to i = i + 1;. Seems cleaner, actually, to not treat it as sugar, so what's the big deal?

Well, in Mix all expressions are objects -- we've talked about functions being objects, methods being objects, and so on. We also have integers as objects, booleans as objects, and so on (note: I write "and so on" because I don't like the strangeness of punctuating an etc. -- I like putting my punctuation on the outside of parens, it just feels right). Another note: for now I want to ignore the performance implications of this choice, and focus on a clean language design. Another other note: classes themselves aren't objects, unfortunately; maybe in Mix version 2.

So integers are just instances of class Int, and we all love to increment our integers using i++;. So how do we write the .NET class implementing the Mix class Int? First, let's look at the abtract class itself:

abstract class Int
{
  public ToInt(){ return null as Int; }
  public ToString() { return null as String; }
  // More conversion operators...
  
  public operator+(i)
  {
    i.ToInt() as Int;
    return null as Int;
  }
  // More arithmetic operators...
  
  public operator++()
  {
    return null as Int;
  }
  // And anything else...
}

Let's take this in a few bites, my hungry friends. First, the syntax term as Type does two things: at compile time, it just checks that the term has only the given Type, and then returns a typeset containing just Type. At runtime, it does nothing, returning term; in an abstract class, it really does nothing, as class never has any code generated for it. The point of the null as Type idiom is to avoid writing new Int(), because it's not as obvious that it's just for compile time.

Given this description, you should be able to understand the definition of operator+; it checks that its argument is convertible to Int, and then returns an Int; the code to actually perform the addition is found in the .NET class implementing Int:

public class Int
{
  public int NativeInt;
  
  // Lots of methods...

  public operator_add(object other)
  {
    int i = Mix.Call(other, "ToInt").NativeInt;
    return new Int(NativeInt + i);
  }
  
  public operator_incr()
  {
    // What to do?
  }
}




So, in operator_add we invoke ToInt on the passed
object and then assume that it worked, and that the result is an Int.
We can safely assume these, because the Mix compiler verified that all
code is well-typed.  This lets us get the native .NET integer, add it to our
own, and return a new wrapper with the sum of these two numbers.


Finally we come to the real issue: how should we implement operator_incr.
Turns out the obvious way to implement leads to really crazy use cases!  Let's
say that the body is as follows:

public operator_incr()
{
  int original = NativeInt;
  NativeInt++;
  return new Int(original);
}



Well, that's craziness, because integers are almost always in everyone's
minds, immutable.  In the following, we pretty much all expect b
to be 42:

var a = 42;
var b = a;
a++;



And of course with the above definition it isn't, not even a little.  So,
leave your integers immutable!  But, how can we write the code, allowing for
operator++ to be a real live operator?  I'm not sure, but I
certainly couldn't find a clean way (nor for operator+= and company).


One option is to say that when a class implements operator++ we
call it, otherwise we interpret it in the "syntactic sugar style".  The nice
bit about this is that it also works for operator+= and friends, so
we can, for instance, write a list that allows you to append elements to it using
list += "new list item!".  The problem is that it complicates both
the mental model of the code, and also the implementation of the back end.


Maybe someone, somewhere has a thought or three?





Posted by



Zach




at

8:08 AM



No comments:
  













Labels:
mix


        

          
        
Thursday, January 21, 2010

          
        






This post is brought to you by the Greek letter...





So, a long time ago
I said that a Mix alpha release is coming, and
it really is, I swear.  Here are some examples of Mix code that the implementation
can currently compile correctly.


1: This is an example of some code that cannot be handled under Hindley-Milner
style inference.  Specifically, notice that in the body of twice
we apply f to the type of its own result.  This requires that
f have type 'a -> 'a, and therefore that twice
have type ('a -> 'a) -> a'.  Clearly in our application of twice
the argument pair doesn't have such a restricted type.

twice(f)
{
    return function(x){ return f(f(x)); };
}

pair(x)
{
  return new Pair(x, x);
}

Int main(String args)
{
  var p = twice(pair)(1);
  return p.First.First;
}



2: This example shows how cool container classes are in Mix, at least
to me.  I love homogenous containers because they (well, the type system) take care
of ensuring I don't do anything too stupid with its elements.  However,
in a language like C# or Java you must introduce an (often artificial) interface
in order to group objects of different type but related functionality in such
a container -- otherwise you just have to relegate yourself to containers of
object, and live with the lack of type safety.  Here Mix infers
what operations are possible on all elements of the array.  The error arrises
because Mix infers that the array possibly contains both integers and strings,
the latter of which do not implement the negation operator -operator.


Another note: arrays are enumerable (which just means that they implement GetEnumerator
correctly), and enumerables implement ToString by simply concatenating
the string representations of all of their elements.  So as long as all of the
elements of an enumerable implement ToString, the enumerable itself
does as well, which is how we can call IO.print on the array.

Int main(String args)
{
  var a = new Array(10);
  for(var i = 0; i < a.Length(); i++)
  {
    a[i] = i;
  }
  a[random(10)] = "Hello!";

  IO.print(a);  //Enumerable.

  a[4] = -a[4]; //Error, strings cannot be negated.
  return 0;
}



3: Here's a funny example in which iterating to a fixed point proves
critical.  Essentially f gets a type that should be interpreted as
"either a pointer to an integer, or a pointer to a pointer to a string, or
a pointer to a pointer to a pointer to an integer, or..."  Which essentially
means that the only operations you can really perform on f are
ToString and Get (which gets the value of a pointer).

Int main(String s)
{
  var f = new Ptr(0);
  var g = new Ptr("Foo!");
  
  for(var i = 0; i < 10; i += 1)
  {
    var temp = f;
    f = new Ptr(g);
    g = new Ptr(temp);
  };
  
  IO.print("f: " + f);
  IO.print("g: " + g);
  return 0;
}



I wanted to put a few more up, but it is time to get back to work.





Posted by



Zach




at

8:10 AM



No comments:
  













Labels:
mix,
mix inference










          
        

          
        
Wednesday, December 30, 2009

          
        






.ctor






I was wandering around the internet looking for interesting programming languages
related blogs and came across the blog
of Gilad Bracha,
one of the people involved with Newspeak.
There were many interesting posts, but an older one caught my eye and got me
thinking about object instantiation in Mix.


In 
Constructors Considered Harmful Mr. Bracha expounds on the problems associated
with object constructors, when it comes to both the reusability and extensibility
of a piece of code, and the uniformity of the language.


One way of thinking about this first point is to consider object construction to be separate from
object initialization.  In this scenario new is an operation that
first constructs an uninitialized object of a particular class, and then calls
the constructor to initialize it.  In this sense one can see that the signature
of the constructor in many object oriented languages is rather arbitrarily
constrained, in that it must return an instance of the class that it is defined
in.  This means, for instance, that new X() must return an instance
of X, and can't return an instance of a related type Y.
This means that, if ever your code changes so that you need such functionality,
you'll have to traipse all around your code base replacing calls to new X().
That's the first problem; how can we 


What might be nice would be for us to separate these two concerns explicitly,
so that new really does only instantiate an uninitialized object,
and then some method on that object is responsible for initializing the object.


But wait, you say: won't this allow us to accidentally create and then use
uninitialized instances of a particular class?  And isn't that frightening and
terrible?  Well, sure, as presented thus far it would.  Of course, there's nothing
that prevents you (save for good practice, of course) from defining all your classes
with an empty constructor and a method named Init.  But, we can
do a bit better (though we'll still trust ourselves not to amputate our own feet
via firearm).


Let us say that one can only create an uninitialized instance of a particlar class from within
that class, so that a random programmer can't new up an uninitialized
instance.  Then let's restore a notion of constructor (call it an initializer)
that is nothing but a (possibly named) constructor, but one that is obligated to return
the object it is responsibly for creating and initializing.  Therefore our
initializer can return instances of a different type if it so desires.


Aside: This is really just like static factory methods as in Java, and in
fact that is one technique for avoiding some of the problems with constructors that has
found its way into general practice.  However in Java we can still write regular
constructors with all of their supposed problems.


But aren't we back to regular old constructors?  Not quite, because we are not
required to return any particular type from our initializer.  So we might have
the following code:

class Foo
{
  new Foo(i)
  {
    var t = new Foo;
    t.Bar = null;
    t.Num = i;
    return t;
  }
  
  new FooBar(i)
  {
    var t = new Foo;
    t.Bar = Bar.Bar();
    t.Num = i + 8;
    return t;
  }
}

var f = Foo.Foo(42);
var g = Foo.FooBar(34);



Then, if ever we need to modify the constructor of Foo, so that,
for instance, it returns an access controlled instance of a Foo,
we can do the following:

class Foo
{
  new Foo(i)
  {
    var t = new Foo;
    t.Bar = null;
    t.Num = i;
    return AccessController.New(t);
  }
  ...
}



Another complaint raised in the aforementioned blog post (the second point I
metioned) is that constructors behave differently than other "callable" objects.
That is, creating an object using a constructor is fundementally different at
the language level (for many languages, like Java, C#, and so on) than calling
a function (or static method, or whatever), so that you cannot, in general, pass
a constructor around, or store it in a variable; they aren't
first class.
In the above presentation there is nothing precluding them from being so.


So really, all we have done is introduced some limited form of static methods,
and restricted the use of new as an object instantiation device
so that it can only be used to create new, uninitialized objects of the same
type as the class in which the call to new is found.  What we
gain is that constructors need not return a particular type, that we still are
"encouraged" by the language to write constructors that return initialized
objects, and that constructors behave in the same way as regular functions, in
that they may be passed around, stored in variables, and so on.


In fact I had never really faced either of the problems that Mr. Bracha described
in my own programming; perhaps this is because I haven't worked on any really large
projects with many other participants.  In general I've never needed a constructor
lke that of Foo above, and if I've ever needed to pass a constructor
as a function I have just wrapped it, thus:

  var ctor = function(i, j, k){ return new SomeClass(i, j, k); };



But still, it is an interesting point that merits some thought, especially in the
context of large codebases that act as a dependency to many other pieces of
software, and whose signature cannot easily be changed.





Posted by



Zach




at

8:24 AM



No comments:
  













Labels:
mix










          
        

          
        
Friday, December 4, 2009

          
        






Down, Down, Down






So, back to the stacks.
This time it's about functions and methods and objects and functions and methods and objects.


So, as I mentioned only briefly,
functions are really objects
that implement a particular method, namely operator().  This is
nice because it means that we can easily define functors and
use them in the same way we use "regular" functions and object methods.  For
instance, the following class wraps an "action" (for those of you not from the
C# world, a
unary function returning void) and
returns an action that keeps track of how many times it has been called:

class CountedAction
{
  private var func;
  private var count = 0;
  
  public operator()(arg)
  {
    this.count++;
    this.func(arg);
  }

  public Count()
  {
    return this.count;
  }
  
  public Reset()
  {
    this.count = 0;
    return;
  }
}



Now, to be fair there are other ways we could swing this sort of thing, but I'm
not as concerned with what is possible
so much as I am with what is simple or even elegant.


So, what we'd like to say is that, whenever we call an object, we simply look
up that object's operator() and call that.  Of course, here
be our turtles.


Retraction, start over.  Let's say first that a callable object is either
a "real" function, or it is an object which has a member operator()
that is itself callable.  Then calling an object consists of either
actually calling it (in terms of typechecking, it means creating a new
environment with bindings for each function argument and then checking the
function's body) if it is a "real" function, or it proceeds by retrieving the
object-to-be-called's member operator() and calling that
with the given arguments.


That's a bit better, but it means that the programmer has to deal with both
objects and these "real" functions, when the whole point of this mess is to
treat everything as objects.  So it remains for us to hide this additional
complexity in the language, so that the programmer can pretend that everything
is an object, and that anything implementing operator() can be
called.


It's pretty easy to see that any "real" function can be easily treated as
an object that has a single member, operator(), whose body is the
exact same as the function.  So in the end all we have to do is, whenever a
"real" function is used in any way other than being called, promote it
to an object.


That's great, and thus topples our turtle stack.  But what's the end result
for me?  Well, one nice thing is that now all functions and methods can be
treated uniformly.  For instance, consider the following class definition:

class Foo
{
  BarMethod(a)
  {
    return a.Bleck();
  }
}


That's equivalent to the following:

class Foo
{
  private var BarMethod;
  new()
  {
    this.BarMethod = function(a){ return a.Bleck(); };
  }
}



All because methods are just functions; see
this post
for more information on how methods are implemented in this way.  And now I'll say
goodnight.





Posted by



Zach




at

8:52 PM



No comments:
  













Labels:
mix










          
        

          
        
Thursday, November 5, 2009

          
        






Concrete Mix Alpha





For the past few days I have been working hard on getting Mix into a state where it can be released.  It's slow going; I have been cutting out everything that isn't necessary for a "preview" (like integration with .NET, support for some level of nominal typing, and a few other things, all of which I intend to discuss at some point).  So sorry about the dry spell, it will rain (if only lightly) soon enough.





Posted by



Zach




at

8:59 AM



No comments:
  













Labels:
mix










          
        

          
        
Friday, October 23, 2009

          
        






Methods Methodical






The last couple of posts have been a bit off topic (where topic = Mix), so let's
get back to it.  These words are about methods in Mix, how they are implemented,
and how the implementation choice corresponds closely to the language's
semantics (which seems like a bad idea, but you'll see what I mean).


So, first let me note that, in the spirit of object orientationification, in Mix
the goal is that every first class entity is an object.  By this I mean that
numbers and strings are objects, class instances are objects, instance methods
are objects, and functions in general are objects.  These are all first class
entities, so they're objects; since classes are not first class (you can't go
passing around classes, unfortunately, at least at this point) they aren't
objects, and in fact they don't have any runtime representation.


It's easy to see how regular class instances are objects (it's by definition,
right?), but perhaps slightly less clear what I mean when I state that, say,
a function is really an object.  All I mean is that a function is an instance
of a class that has a method op().  This gets us into a bit of a
loop; if methods are objects, then this method is an object as well.
So, at the bottom we need to have some notion of a callable object; a
"special" object with no methods at all.  From the programmers perspective these
can be ignored, because a callable object can always be treated as an object with an
op() method that is callable.


So, methods are really just function objects; sounds great, very symmetric.
Now, we have a question to answer: do these function objects take
this as an argument, or do they contain a reference to
this that is bound once and for all when the function object is
instantiated?  It turns out that this choice will dictate the semantics and
correctness of certain kinds of statements, and so we should try to make a
reasonable and (hopefully) reasoned choice.


Consider the following Mix code, which defines a class implementing
ToString, and then derives from this a class that does not
override ToString.  Finally, it adds a ToString method
to an instance of the derived class.  Assume that print just
takes a string and writes it to standard output.

class Base
{
  ToString(){ return "Base!"; }
}

class C : Base
{
  public var Name;
  new(n) { this.Name = n; }
}

Int main()
{
  var c = new C("Bob C.");
  var f = c.ToString;                  // (1)
  
  c.ToString = function(self)
    { return self.Name + " " + f(); }; // (2)
  c.ToString = function()
    { return c.Name + " " + f(); };    // (3)
  
  print(c.ToString());                 // (4)
  
  return 0;
}



In this code there are a few issues.  First, at point 1 we store a
reference to the original value of c.ToString (which is more or
less B's implementation of ToString).  I think it is
clear that we want f to contain a function object that takes 0
arguments (so that later we can write f() and get the same result
as if we wrote c.ToString()); this implies that the function object
implementing c.ToString has c (or this)
bound as a member variable, and does not take it as an argument.


Then, at point 4, we use the new method in the same way as we used the
original one; we aren't obligated to pass c to the new
ToString method.


Next, check out points 2 and 3, which highlight the difference
between the two treatments.  In the first case we get explicit access to
this (though without some syntactic sugar we can't name it the
keyword this, and so give it a different name).  In the second
we only have access to this (actually c) because it is
contained in an outer scope.  While at first glance they seem equivalent,
they aren't quite the same.


To see the difference, consider this function that, given a function object
(which should correspond to a ToString method), returns a new
method that annotates the output of the given method with the operated-on
object's Name field:

function annotate(f)
{
  return function(self)
    { return self.Name + "(" + f() + ")"; };
}
o.ToString = annotate(o.ToString);



While this works under the interpretation higlighted by point 2 above, it
would not work under the interpretation shown in point 3:

function annotate(f)
{
  return function() { return X.Name + "(" + f() + ")"; };
}



The question is, "What is X?"  One answer is that we can work around the issue
by passing this (or at least, the object that correspond to it) to
the scope enclosing the new method's definition:

function annotate(o, f)
{
  return function() { return o.Name + "(" + f() + ")"; };
}
o.ToString = annotate(o, o.ToString);



So we don't gain expressivity going the second route, though we do perhaps lose
code clarity.  And in fact, this seems to be exactly how it works in
Python:

class Foo:
  __init__(self, i):
    self.I = i
  
  Get(self):
    return self.i
    
f = Foo(7)
f.Get()  # = 7

g = g.Get
f.Get = lambda: g() + 1
f.Get()  # = 8

f.Get = lambda self: self.i
f.Get()  # Error!



Anyways, if they are kind of the same, but kind of not, then where does the real
difference reside?


One important point is that the first route allows us to use a single function
object instance as a method on many instances regardless of whether the function
object needs to access this, while the second option only lets us
reuse a function object if the object does not access this.  That
is, under the first interpretation many instances could share a single function
object as a method.


However, a caveat: the "workaround" described for the second situation has a small "strangeness", in that because the variable holding the equivalent of this is captured it could be updated, but only in by the directly enclosing scope.


All of this leads me to a decision.


  Since at the outset a new function object instance
      is created whenever the containing class is instantiated, it seems more
      symmetric to use the second route (which would in general encourage always
      creating a new function object instance when setting an object's methods).
  
  Since method use doesn't require this to be passed explicitly,
      and since method definition within a class also doesn't require it, then
      method definition from outside of a class should not as well.
  



There is also a second, simpler issue here as well.  In point 2
notice that we really do have access to this; does this mean that
under the first interpretation we should at this point have access to the
private members of this, and therefore the private members
of c?  I am inclined to believe that we should not: private should
have a lexical meaning, which is that only code lexically within the class, or
lexically within a derived class, whould have access to said members.  What's,
more, this is in keeping with the choice made above (that is, as we've picked
the second interpretation, we don't even face such an issue).


Furthermore, it would produce a asymmetry between function objects intended to become methods,
and those intended to be simple function objects, which could only be solved with
more syntax.  Namely, it would allow a function object's body to access the
private members of an argument only when the function object was intended
to become a method; we would need to require that in such cases the argument be
made explicitly this.  Anyway, lot's of special casing for an idiom that
could lead to some seriously obfuscated code.





Posted by



Zach




at

1:43 PM



No comments:
  













Labels:
mix










          
        

          
        
Thursday, October 1, 2009

          
        






Clarity






So, last time
I said "the invocation n.Foo() will raise a null pointer exception", which
it will of course, but in the more complete example involving Length no such exception could ever be
raised: the body of the while loop was guarded by a test to determine whether current
was null.  Duh.


The point is, it seems that some simple form of nullness analysis would completely eliminate the problems with null
that I talked about previously.  But, I also said that such an analysis has its own problems.  The thing is, this kind of
analysis isn't simple after all.  Consider:

test(n)
{
  if(n is null)
    return 42;
  else
    return 17;
}

main()
{
  var n = null;
  
  if(test().IsPrime())
    n.Blah();
}



This is a relatively simple (though convoluted) example, but it should illustrate the idea: to get "perfect" nullness
analysis in we would need to actually run the program (to determine whether the number is prime using the definition
of IsPrime).  This is something we really don't want to do (there goes any hope of a termination proof, for
one thing).  What if the code needs to ask the internet whether the number is prime?  All kinds of chaos.


Perhaps a better idea is a very naive nullness analysis, one that only recognizes explicit guards of the form
if(something is not null) { ... } or similar.  Under such a system, if something had an empty typeset
(indicating that it is always null), then the body of the if statement would not be examined.
Then, unguarded method invocations on objects with empty typesets could raise a warning.


This would yield more accurate types during inference, and would limit the number of spurious warnings.  The latter aspect
should be obvious, but the former might not be as clear.  Consider:

main()
{
  var n = null;
  var g = 42;

  if(n is not null)
  {
    n.Bleck();
    g = new Foo();
  }
  return g + 7;
}



Let's assume that Foo does not implement operator+.  With the simple form of nullness analysis
described above, this code typechecks correctly: g is never assigned to an instance of Foo, so its
typeset contains only Int[] (which does implement operator+).  Without nullness
analysis, the invocation n.Bleck() would generate a warning, and the expression g + 7
would generate an error.  Here nullness analysis (or nullness inference, as the Nice
people call their version of it) is acting as a kind of poor-man's
dataflow analysis.


We still need to analyze such a technique for correctness.  For now I'll handwave it away; I've got work to do.





Posted by



Zach




at

8:44 AM



No comments:
  













Labels:
inference,
mix,
value-inference










          
        

          
        
Monday, September 28, 2009

          
        






Funny Business with the Billion Dollar Mistake






First, null often gets a bad rap, and it
deserves what it gets.  Unfortunately, it can be quite handy, and it is pretty well understood by the average programmer,
so I'm keeping it anyway, at least for now.  Fortunately almost no one will use Concrete Mix, so it won't be an
expensive mistake.  Sidenote:  I had to look this
up while writing the above.


So, last time I said that
null acts a bit strangely.  To understand the how and the why, let's look at some code, and see how we
would typecheck it:

main()
{
  var n = null;
  n.Foo();
  
  var m = null;
  m = new Bar();
  m.Bleck();
  
  return 0;
}



Some totally useless code, but it will prove illustrative, so bear with me.  First, let's assume we're playing
with the iterative algorithm I've been harping on about.  So, after iterating to a fixed point, what does the
typeset of n contain?  Let's say that it contains nothing (that is, null indicates
no type, and not an empty type, like void might indicate): {}.  Then, should
the call n.Foo() be deemed correct by the typechecker?


It turns out that classifying this call as incorrect has some unforseen consequences.  I noticed the problem
first when playing with a simple singly-linked list implementation.  Consider the following code for a naive
Length method:

class List
{
  private var m_Head;
  public List() {...};
  public Append(element) {...};
  public Length()
  {
    var count = 0;
    var current = this.Head;
    
    while(current is not null)
    {
      count += 1;
      current = current.Next();
    }
    return count;
  }
}



(Syntax note: as operator== can be overridden, Mix offers the is and is not
operators for referential equality and inequality, respectively).


This code is a tiny bit more useful, though it's only a sketch.  I'm sure you get the idea.  The point
is that, if we call Length on a list that is always empty, and we treat null
as described above, we have a problem: the typeset of current is always empty, and so the call
current.Next() would be flagged as incorrect by the typechecker.  Now, you might suggest we perform
some kind of nullness analysis and thereby determine that the body of the while loop cannot
be executed, but this brings its own issues, so let's leave it alone for now.


So maybe our initial decision was wrong, and null should indicate the empty type.  But this is no good,
for consider m in the first code listing: its typeset would be something like {Null[]; Bar[...]},
and no method call on that typeset could ever succeed, as Null presumably implements no methods.


So it seems like the only solution (until we take a harder look at nullness analysis) is to classify calls on
the empty typeset as correct.  And in fact, this doesn't violate our
definition of correctness, because the
invocation n.Foo() will raise a null pointer exception, not a "method does not exist" exception.





Posted by



Zach




at

10:28 AM



No comments:
  













Labels:
inference,
mix










          
        

          
        
Thursday, September 24, 2009

          
        






Recursion: See "Recursion"






So, taking a brief break from history, let's take yet another look at type inference in Mix.  Way back
when I mentioned that
one of the restrictions we needed to place on the subset of Mix for which the algorithm worked was
that no recursive calls we allowed.  First, let's recall why; consider the following code:

Factorial(i)
{
  if(i == 0)
    return 1;
  else
    return i * Factorial(i - 1);
}

main()
{
  var i = Factorial(7);
}



A simple factorial, no big deal.  So, what should they type of i be?  Well, first let's get our
terminology straight: what concrete types should be in the typeset of i?


Just by thinking about it we should conclude that it should hold only Int[...].  But, if we run
the original, or even the iterative
algorithm on this code, what happens?  Infinite loopiness.  The terror!


(Incidentally; if the call to Factorial in main were Factorial(7.0f),
i would hold both Int[...] and Float[...] in it's typeset; the base
case should tell you why).


The problem is that, when type checking the right hand side of the assignment in main, we are
obligated to check the type of Factorial with an argument of Int[...].  In so doing, 
we check that Int[...] implements operator==, operator*, and operator-.
And then we check Factorial with an argument of Int[...].  In so doing we check that...
And on and on.


However, this is easy enough to fix (or at least, it's easy enough to partially fix): let's keep track of the
"call chain", and whenever we start checking a new call, we'll search the call chain for said call.  If it is
not in the call chain, we'll add it to the call chain and check the type like normal.  But if it is in
the call chain, we get to stop type checking: we know that checking the call will only get us right back
where we started.


Pause for a second: what exactly is a "call", and how do we represent a "call chain"?  A call is just the
(unique) name of a function or method, along with the type of each argument.  This last bit is
important, because otherwise we might realize that a previous call to AddTwoNumbers succeeded
with integers, and imagine that it would succeed on files.  A call chain is a list of such calls, in the order
we've invoked them. 


But, is finding a call in the call chain an error, or success?  And anyway, what does it mean?  Second question
first: it means that there is a loop in the call graph, just like we saw in the code above, and furthermore that
the recursive call is at the same types: the function is being used in a monomorphically recursive way.
First question second: we'll treat this as a success, because it indicates that along this call chain all
possible method invocations succeed (possibly while looping infinitely).  If they succeed, it meets the correctness
requirement we've already talked about.
If it loops infinitely, it still meets this requirement. (Side note: we're essentially saying "it's correct, unless
this other runtime error comes up, but that error will hide the incorrectness"; we'll see this with respect
to null next time).


Side note: this is rather strange, in that it somehow means that we are interested in a
greatest
fixed point,
whereas normally we are interested in least fixed points.  We're taking the view (in a handwavish sort of way)
that, if when trying to prove P, we eventually have to prove P, we'll consider this
to be true, or provable.


So, if you take the above as correct, are we back to a terminating algorithm?  Well... not necessarily.
Once again we would have termination "for free" if only we had a finite number of types to deal with.
Unfortunately, we don't, and so this doesn't terminate so easily.  In fact we could force this to
terminate by requiring that all recursive calls to a particular function have the same type (that is, make
all functions monomorphically recursive), but that's no good -- check out this simple function that isn't
monomorphically recursive:

identity(arg)
{
  if(identity(rand() == 42))
    return identity(arg);
  return arg;
}



It's just the identity function, but with a little twist: no matter what the type of arg is,
the function might recursively called with a Bool[...] argument.  In ML the type of the function
is 'a -> 'a, but it is used in a polymorphically recursive way.  Anyway, a strange function, but
they can be found in "real life" (especially when mutual recursion comes into play); Henglein had an interesting
(and approachable) paper on inference
in the presence of generalized polymorphic recursion in an ML-like settings -- check it out.  If you don't bother,
let me just mention that inference for such a language can be shown to be undecidable (indeed that's a contribution
of the paper) by showing that it is equivalent to semi-unification.  And that means that, in what I'm trying
to do with Mix, I will inevitably face some tough choices to reign in this undecidability somehow; sounds fun!





Posted by



Zach




at

9:54 AM



No comments:
  













Labels:
inference,
mix










          
        

          
        
Friday, September 11, 2009

          
        






Been There






So, I'm not the first idiot to think about doing type inference in object-oriented languages, obviously.
There's plenty of prior work in this area.  Here's a list of some relevant research, by project.


O'Caml: that's what the O stands for, maybe?  Anyway, it seems like O'Caml's object layer doesn't
    get much love, but the type system behind it is pretty interesting.  It implements structural typing
    via row variables.
    "The Essence of Type Inference in ML", a chapter
    of Advanced Topics in Types and Programming Languages,
    is a very readable introduction to this  (and indeed many interesting aspects of type inference in ML-like languages),
    written by Francois Pottier and Didier Remy.

Self: there's been a lot of work on type inference for Self, generally it seems for the purpose
    of efficient implementation.  Ole Agesen's work on the Cartesian Product Algorithm (CPA)
    has been the basis for a number of different lines of research; for instance, Olin

Python: I haven't heard much on this front recently, but Mike Salib's thesis describes an application
    of the Cartesian Product Algorithm to Python which is pretty neat, though the subset of Python it is implemented
    for is a bit smaller than some might like.

Java: An extension of the CPA for dealing with data polymorphism (more on what that means in
    a second) called the Data-polymorphic CPA has been applied to Java in the context of detecting
    invalid downcasts.

More: we'll get to others at some point.


    
So, why don't I just implement some of this previous work and be done with it?  Well, other than
the fact that it isn't as fun as trying something new, each of the previously described approaches leaves
something to be desired.  So, let's first examine what we (ok, what I) want from the inference
algorithm and language, and then see which criteria each of the aforementioned approaches meets.


Structural Typing:  If an expression implements a particular method, we should be able to call
    that method (subject to the "correctness" constraints we've
    already talked about).

Recursive Types and Functions:
    A class C should be able to contain members that are instances of class C,
    see?  And members should be able to call themselves recursively.  Oh, and make that mutually
    in both cases.

Data polymorphism:
    Variables should be able to hold objects of more than one type, again subject to the constraints
    already mentioned.  This property is critical to useful data structures (we want lists that hold
    variables of several types, and that can be accessed in an intelligent way, for example.  Or maps
    from strings to different kinds of entities in a super awesome
    MMOORRPPGG).




First, row variables a la O'Caml.  They handle the structural typing end of things, and recursive
types and functions, just fine.  But row variables don't say anything about mutable data (and therefore
data polymorphism), and O'Caml (like most ML-like languages) doesn't treat mutable data
in the way that I wish it would.  In O'Caml, a mutable data 'cell' is stored in a reference.  When
the reference created it is filled with some initial data, which can later be updated.  However, the
type that the reference holds cannot change; if the reference is first filled with a string,
it cannot later be filled with an int.


In addition, the structural typing supported by row variables isn't quite as strong as we might like.
It can deal with inferring the types of arguments to a function by the way that they are used, but some
simple constructs throw the algorithm for a loop.  For instance:

let someFun b =
  if b then
    new Foo
  else
    new Bar



The above will not compile: there is no implicit subsumption rule in O'Caml.  The algorithm will not
find a subtype of Foo and Bar, and use that in each branch of the if
expression.  It will only work if you explicitly "cast" to such a type (note that the cast is safe, and
not a downcast).  This is exactly the kind of code we will be writing (and be expecting to work!),
so row variables as implemented in O'Caml aren't going to cut it.


I was going to continue with the rest of my bullets, but I've run out of ammo (and time), so you'll have to
wait until next time.





Posted by



Zach




at

9:29 AM



No comments:
  













Labels:
inference,
mix










          
        

          
        
Wednesday, August 19, 2009

          
        






Loopiness and Stringiness






Actually, there will be no strings here;
right now I'm just going to be talking about loops.  Earlier I mentioned that
the simple algorithm
for infering types couldn't handle loops (amongst other things), so let's see why they are an issue more
clearly, and try to solve it.


I gave a simple example last time:

main()
{
  var i = "Hello!";
  for(...)
  {
    i.SomeStringMethod();
    i = new Foo();
  }
  i.SomeFooMethod();
}



The problem is that the call to SomeStringMethod (which is presumably valid when
i is a string) is valid if the loop runs only once, but isn't if the loop runs
more than once.  So, we'll assume that the loop runs more than once; now how do we build
this assumption into the algorithm?


At first glance we might simply analyse the body of the loop twice; the first time when
we check the call to SomeStringMethod i is known only to have type
String[], and so the call succeeds.  The second time i's typeset
contains both String[] and Foo[...], and so the call fails
(assuming that Foo doesn't implement SomeStringMethod, which is
rather the point of that name).


That sounds good, and it's simple, right?  Unfortunately, it's not right, either.  Consider
this more convoluted piece of code:

main()
{
  var a = new SomeClass();
  var b = new SomeOtherClass();
  var c = new YetAnotherClass();
  
  for(...)
  {
    a = b;
    b = c;
    c = a;
  }
  a.SomeMethod();
}



It's pretty obvious what's going on here: depending on how many times the loop runs, this call
could be valid, or it could be invalid.  Let's say that both SomeClass and
SomeOtherClass implement SomeMethod, but YetAnotherClass
does not.  Then after 0 iteration the call would succeed, and after 1 it would succeed, but after
2 it would fail.  So, let's analyse the loop body 3 times; 3 is enough for anybody, right?


But wait, you say: we could keep adding variables (d, e, and so on)
to cause a greater number of iterations to succeed before failure.  Right you are.  Instead of
hardcoding the number of times we should check the body of the loop, we need a way to determine
that during analysis.


One way we could approach this is by analysing the code in a separate pass and calculating the
number of times, perhaps be checking the number of variables.  That would work in the above
case, but what about in more complicated cases?  Trust me, there are cases in which that sort
of simple analysis won't work.  Maybe there's a more complicated analysis?  I don't know it.


Another way, and indeed the way I will choose, it to iterate to a
fixed point.  That is
we keep on analysing the body of the loop until no typeset changes; at that point every typeset
should be complete.  Fixed point analyses are really common in Computer Science, and you
seem to bump into them quite frequently when working with compilers; for example, the first
time I ran into it (I think) was when I was implementing a compiler for Tiger (a simple
imperative language) based on Andrew Appel's book
Modern Compiler Implementation in ML,
which is, incidentally, also the first time I programmed in Standard ML.
The specific usage was during liveness analysis, solving dataflow equations.


Anyway, to see how this works, let's run through the last example.


At the outset, each typeset (for a, b, and c) has a single
element in it.  After the first iteration, a has SomeClass[] and
SomeOtherClass[], b has SomeOtherClass[] and YetAnotherClass[],
and c has YetAnotherClass[] and SomeClass.


After the second iteration, each typeset has all three classes; we could stop at this point, but
how would we know to in the general case?  Instead we continue.


After the third iteration, each typeset still has all three classes; no typeset has changed.
Now we can stop; we know that further iterations will not yield additional information.


There are a number of questions that remain to be answered.  First, how do we know that we'll
eventually stop adding concrete types to the typesets, and thereby terminate.  Clearly there
are an infinite number of concrete types, so we could concievably continue to add types indefinitely.


Second, how do we know that the collection of types in the typesets is accurate after we reach
a fixed point?  With the simple algorithm I didn't talk about correctness because it was obviously
correct (for the ridiculously limited subset of Mix to which it applied), but now correctness
is less obvious.


Stay with me, and I'll get to answering these questions (or at least trying to) soon enough.





Posted by



Zach




at

10:39 AM



No comments:
  













Labels:
inference,
mix










          
        

          
        
Monday, August 17, 2009

          
        






Algorithmics Simplistic






Last time I said that my notion of correctness is simple: you should never invoke a method on an object
that doesn't implement it.  To start off discussion, let's look at a very simple algorithm that statically determines
this.  The subset of Mix that it handles is one without loops, without recursive calls, and with no inheritance; basically a fairly
useless subset of Mix, but let's go with it.


The idea is simple: starting from the programs entrypoint (a free function I'll call main), we'll interpret
the program, but instead of using regular values like ints, and strings, and objects, we'll use types; that is, we'll
perform an abstract interpretation.  Abstract
interpretation, invented and first formalized by Cousot and Cousot, has a long history in static analysis; I'll be more
or less ignoring the formalization from here on in, and just going with an informal approach.  Check 
this out for a fairly readable introduction to "real" abstract interpretation.


Moving on to the actual algorithm, consider the following trivial code:

class A
{
  public var Member;
  
  public new(arg){ this.Member = 1; };
  
  public Method1(){};
  public Method2(){ this.Method1() };
};

main()
{
  var a = new A();
  a.Method2();
  a.Member;
  return;
}



So, starting in main, let's interpret the program.  At the outset, we have a variable declaration; let's
pretend that it was really written as a variable declaration followed by an assigment:

var a;
a = new A();



That let's us look at the two steps separately, instead of all at once.  So, first we see a variable declaration, so
we add a to the environment; we associate with a the empty typeset.  A typeset can
be thought of as a set containing all types that the variable is known to hold; by looking at how all of the types
in a typeset intersect, we can understand what operations on a are guaranteed to be legal.


Next, there's an assignment: the effect of the assignment under the interpretation is that the typeset associated
with a should be augmented by the right side's value; but how can we determine the right side's value?  Let's
continue interpreting.  new (with a particular class name and constructor arguments, of which there are none in
this case) corresponds to constructing a concrete type based on the class and then invoking the class's construction operator.


Concrete types are values that represent the properties of an object, namely it's class name and its fields (both members
and methods).  Concrete types are what go into typesets.  From now on, we can think of the value of a particular expression
(under the abstract interpretation) as a typeset containing 0 or more concrete types.


So, first we construct the concrete type: it has name A, and it has 1 member and 2 methods, namely Method1 and
Method2; for now we'll indicate this concrete type as follows: A[Member: {}; Method1: {function()}; Method2: {function()}].
This should be read as a concrete type "A with field 'Member' having the empty typeset, 'Method1' having a typeset of 1 element,
and 'Method2' having an typeset of 1 element."  The elements in the typesets for Method1 and Method2
are callable; I'll talk about callables in some other post, but for now the name should give you enough information
to go on.


Now that we've constructed the concrete type, we interpret the constructor.  This means we analyze the body of the
new operator for class A, using an environment in which this is bound to a
typeset consisting of only the just-constructed concrete type.  So, we look inside the operator's definition, and the first
thing we see is another assignment.


Once again we add the value of the thing on the right to the typeset of whatever's on the left, and this time the
thing on the right is just an integer; for now we'll say that the concrete type of an integer is Int[],
therefore the value of the right side is a typeset containing only 1 concrete type.  Essentially we're setting the
typeset of the thing on the left to be the union of itself and the typeset of the thing on the right.


So, the result is that the original concrete type we constructed in the previous paragraph is now the following:
A[Member: {Int[]}; Method1: {function()}; Method2: {function()}].  Finally we return to main;
the important thing to notice now is that we know that the concrete type has a member Member that at some
point could be instantiated with an Int[], and that a's intersection contains this
concrete type.


Next we come to an method invocation:

  ...
  a.Method2();
  ...



It actually consists first of a field lookup and then a call.
In particular, we first check to see that all of the types that a could be have implement Method1,
and then we check that each of the fields' types are callable.  Again, a can have only 1 type,
which does indeed implement Method1, and it is callable.  So, we interpret the call, by looking
in the definition of Method1, passing the relevant concrete type in a's intersection
as this.


In Method1 we follow a similar process to check that this implements Method2,
which it does.  We go into the (vacuous) body of Method2, and eventually come all the way back to
main.


Next we check that all of the types in a's intersection have a field named Member; again they
do, and in this case its type is dropped on the floor (just as the value of a.Member would be dropped on the
floor at runtime).  Finally, we return, and that's that: all of the code as been interpreted, and no invocation failed,
so the code is deemed correct.


That got really long winded, and we didn't even seem to do much: there were no errors, and a only ever
had 1 type in its typeset, so it wasn't too interesting.  So, let's briefly look at a slightly more interesting
example; we'll ignore some aspects (like where rand comes from) to keep it simple.

class C
{
  public new(){};
  public Foo(){ if(rand() == 0) return 1; else return new A(); };
}

main()
{
  var c = new C();
  var i = c.Foo();
  i.ToString();
}



Interpreting main we see that, after the call to c.Foo(), i's typeset
should have 2 concrete types, one corresponding to an integer, and the other to an instance of A.
When we check to see if all of these types have a field named ToString in the last line of main
we encounter an error: C doesn't implement ToString.


Hopefully that was pretty clear, and also pretty obvious.  You might recall that I pretty severely limited the language
at the outset, and now you might be able to guess why: if we allow recursion with this simple algorithm, it will go into
an infinite loop, following recursive invocations over and over.  If we allow loops, the problem is only slightly more
subtle.  Consider the following code:


main()
{
  var i = "Hello!";
  for(...)
  {
    i.ToString();
    i = new A();
  }
}



Our simple algorithm, with no modification, would not see an error here, though there clearly is one if the loop runs
more than once.  It will check the invocation i.ToString() when i's typeset contains only
Int[], and not when it contains also A[...].


So, lot's of new stuff (not like there was much beforehand).  I'll talk more about typesets, concrete types, callables,
and I'll start to address both of these algorithmic issues in the next few posts.





Posted by



Zach




at

2:44 PM



No comments:
  













Labels:
inference,
mix















Introductions






I said that Mix is an object-oriented language after the tradition of C# and Java,
which is true enough.  There are classes, and the syntax is C-like, and the operational
semantics of the language are exactly what the average C# or Java programmer might
expect.  So, sample code:

class SomeClass
{
  private var someMember;
  
  public new(anArg, anotherArg)
  {
    if(anArg)
    {
      this.someMember = anotherArg.M(anArg);
      anotherArg.N();
    }
    else
    {
      this.someMember = new List();
    }
  }
  
  public GetSomething()
  {
    for(var i = 0; i < 42; i++)
    {
      this.someMember += i / 6;
    }
    return this.someMember;
  };
};



This class really doesn't do anything; it's here to highlight that Mix code really looks
like some code you might have written in a language like C#.  It has if and for, members
and methods, and all kinds of normal stuff; if you don't like that everyday stuff you're not
going to like Mix's syntax, unfortunately.  In my opinion it is a reasonably usable and
certainly widely known syntax, so I went with it.


The only interesting bit is that there aren't any types in the usual places.  That is,
you normally expect, in a statically typed language (unless your boss let's you write in
O'Caml, or F#,
or some other language with inference) to have to write types all over the
place.  If it were Java we'd need them for each member, for each method (return and argument
types), and each local variable.  In C# these days you can
drop the types on locals.  But
here we haven't any at all, unless you count SomeClass itself, which I won't: it's more
like a prototype, in that we use it to create an object, but not to classify objects later
on, as we'll see.


So, this is the language in which I am interested: no types on members, methods, or local
variables, looks somewhat like your everyday object-oriented language, simple enough.  I
could compile this down to Python, say, or Javascript, easy as that (where "that" is
snapping your fingers).  But I don't only care about compiling and running Mix programs,
I care about analysing them for correctness at compile time (and really, all I would have
done was give a different syntax to a fragment of one of those languages; that's no fun).
Now, you might be the sort of person that isn't interested in static analysis; you probably
won't care for Mix, either, so you might want to hang out with the folks that left when I mentioned
braces.


Still here?  Ok, so, what do I mean by correctness?  I'm taking the very simple criteria of "doesn't call
methods on objects that don't support those methods" as meaning "correct" with respect to the type system.
So, I don't want to call Append on an Int, for example.  We'll (eventually) see how this criteria
can be used in some interesting and (hopefully) powerful ways, but for now let's leave it (my "correctness"
criteria) at that.


To see what I mean, let's look at a (very slightly) more reasonable example, with PETS!  (I like
pets, in particular my cat, Mortimer).  Here we go:

class Cat
{
  public Speak() { return "meow!"; };
  public Pounce(target) { ... };
}

class Dog
{
  public Speak() { return "arff!"; };
  public SearchForBombs() { ... };
};

main()
{
  var pet = null;
  if(somethingOrAnother)
  {
    pet = new Cat();
  }
  else
  {
    pet = new Dog();
  }
  print(pet.Speak());
};



So, super simple example, but you probably already get the idea: pet could be either a Dog or a Cat, but
because both of them can Speak, we're all good.  What if the we tried to call Pounce on pet?  According
to Mix (that is, according to me), we'll treat it as an error: there is some path through main that leads
to pet having an instance (of the class Dog in this case) that cannot Pounce.  Even if the programmer
knows, somehow, that somethingOrAnother is always true, the type system will treat this as an error.


One way to think of this, which I'll make more concrete in future posts, is that every target of an
method invocation must always (syntactically, in the source) be an instance of a class that implements
the method.  So, in the above example, in the expression pet.Speak(), pet will,
no matter what, be set to an instance of a class that implements Speak, and is therefore correct.


Anyway, that should give you a bit of the flavor of what's to come.





Posted by



Zach




at

11:52 AM



No comments:
  













Labels:
mix










          
        

          
        
Friday, August 14, 2009

          
        






Concrete Mix





Concrete Mix (or just Mix, if you prefer a shorter handle) is a statically typed object oriented language.  The language itself follows the tradition of languages like Java and C#, with an important difference: Mix is intended to be an experiment in complete type inference for such languages (that is, whole program type inference without any annotations).

This blog will mostly detail the language and my experiments with static analyses on Mix, though I might occasionally mention some programming languages- and type theory-related items.





Posted by



Zach




at

1:53 PM



No comments:
  













Labels:
mix



Home




Subscribe to:
Posts (Atom)

Concrete Mix

Thursday, April 22, 2010

Again with Immutabililty

Tuesday, March 9, 2010

Innate Immutabilities

Thursday, January 21, 2010

This post is brought to you by the Greek letter...

Wednesday, December 30, 2009

.ctor

Friday, December 4, 2009

Down, Down, Down

Thursday, November 5, 2009

Concrete Mix Alpha

Friday, October 23, 2009

Methods Methodical

Thursday, October 1, 2009

Clarity

Monday, September 28, 2009

Funny Business with the Billion Dollar Mistake

Thursday, September 24, 2009

Recursion: See "Recursion"

Friday, September 11, 2009

Been There

Wednesday, August 19, 2009

Loopiness and Stringiness

Monday, August 17, 2009

Algorithmics Simplistic

Introductions

Friday, August 14, 2009

Concrete Mix

I Am

What's Come Before

Labels

Interlink