Thursday, February 18, 2010

Artificial

I remember reading a book by Dan Simmons a long time ago; could have been Endymion or The Rise of Endymion. I've always been a science fiction fan, and the Hyperion series was pretty cool. In this book, whichever one it was, there are artificial intelligences, as tends to be usual for works of science fiction. But what was really interesting to me at the time (must have been 10 years ago, at least) was that, in the book, these artificial intelligences were described as having evolved, in the distant past, from an experiment in evolving programs. The best part was that Simmons actually mentioned an artificial life researcher by name: Thomas Ray, whose work (the Tierra project) I promptly investigated.

Turns out it was The Rise of Endymion; I should read my own links.

I eventually implemented my own little a-life simulator, had fun with it for a while, and then moved on to other things; I'm not a biologist. But recently while on vacation I decided to mess around with it again. I had stumbled across nanopond at some point, and it looked rather neat, so I decided to implement a system based loosely on that project -- it is unimaginatively called "PondLife".

The basic idea behind nanopond, PondLife, and a host of other similar projects is simple: you have a bunch of organisms that consist of some kind of "code", and which can reproduce. And you execute them over and over again. Usually the virtual machine that executes them isn't perfect, and makes mistakes; this leads to mutations. And usually the organisms compete for something (energy, processor time, memory, or other more artificial resources), which leads to a form of natural selection. And so you get evolution.

Nanopond is written in really efficient C, and uses all sorts of twiddling to pack 8 instructions into every int. I decided to implement my system in C# because I long ago lost any desire to program in the hellish twisted misery that is C ever again, you can't make me damnit! Bwaahhh!!!!

Sorry about that. So, not a fan of writing C when I don't have to (although, as we shall see, this could have been one of those times), and I just wanted to get something working immediately, which is easy when you have a huge standard library at your fingertips. Plus, everyone says C# can be just as fast as C, so why not?

Here's the scenario: there's a 2 dimensional grid of "cells", which wraps around like in Asteroids; its a torus. Each cell has a "genome", basically a vector of instructions that are its code. (In nanopond every cell gets the same amount of code; in Pondlife different cells can have different amounts of code). The instruction set is basically that of Brainfuck with a few tweaks: there's a register and a pointer, a cell can increment and decrement the register, and move the pointer forwards and backwards. A cell can read its code into the register, and write the register into the code (that is, PondLife code can self-modify). And there is an "output buffer", again which the cell can write into and read out of. Finally, the cell can "turn", updating the way it is facing in the 2D grid, "kill" a neighbor, or "share" with a neighbor (more on neighbor's in a second).

Every cycle a random cell is selected and executed until it is out of energy. If it has written anything into the output buffer, the buffer is copied into the neighbor's code (where "neighbor" depends on which way the cell is facing). Instructions take energy to execute, so each cell also has its own amount of energy; this is the resource I mentioned. And the instructions don't always go as planned; hence mutation.

A cell can gain energy by killing its neighbor, and from the sun or other operators. See, there are various "operators" that are executed every so often. One is the SunshineOperator, which adds energy to all cells. Another is the SedimentOperator. Remember how I said that each cell can have a different length of code? Well, if you think of that length as the depth of the world, then think of this operator as sediment accumulating beneath living cells. Eventually this sediment can accumulate to the point where the cell doesn't have enough room for its replication code, and therefore cannot reproduce.

To combat sedimentation we have the ErosionOperator, which works as you might expect, decreasing the depths of all cells and smoothing them as well. There are also a number of "randomization" operators; the CosmicRayOperator randomly flips bits in cell code every once in a while, and the RandomizeOperator more aggressively randomizes entire genomes. There are operators for decay, for seasons, and other fun things.

So, that's all well and good, but what do we get? Well, one fun thing we get is abiogenesis: starting with an empty environment and a few operators (sunshine and randomization are all that you really need in PondLife) after only a little while "viable replictors", cells which can copy themselves well enough that the copy can copy itself again, appear in the environment. And PondLife doesn't seem to "get stuck" too quickly, where only one life form exists in the environment; this is especially true if you include the sedimentation and erosion operators, which force cells to compete not only for energy but also for "depth".

"Pics or it didn't happen." I'll do you one better. I ran a simulation in a 128 by 128 world for 8 or so hours and have 2 videos. In the first video, different colors are different lineages; two cells have the same lineage if one was produced by the other, or if their parents have the same lineage. In the second video, depth is indicated in grayscale, with black being deeper (where more code can fit) and white being shallower (where less code can fit). In this simulation I also used an operator that caused less energy to be added to cells in "deeper water" for fun.

That simulation only ran for 2 billion cycles. That's not many, considering it took 8 hours (probably would have been the most intense computation this poor laptop has ever seen, except I did a 17 hour run right after). I need to optimize.

EDIT: Here are the videos:

Tuesday, February 16, 2010

Time Dilation

I haven't had much to say recently. I rather doubt anyone has noticed, and yet nevertheless I feel a bit guilty... Perhaps this indicates some underlying psychological problem? Anyway, I have an excuse: I was working on two different papers with various colleagues. If things work out I may infinitely increase my publication list!

The first, entitled Focused Inductive Theorem Proving describes the theory behind our interactive and automatic theorem prover for an intuitionist logic with support for generic quantification (using nabla, which is nice for reasoning about systems with names and binding) and definitions (via fixed points, which are useful for encoding inductive and coinductive data structures, like lists and streams, respectively). The system is called Tac, and is released under the GPL version 3. One fun note: we have (kind of, call it mostly) solved the POPLMark challenge parts 1A and 2A with Tac.

The second is A Meta-Programming Approach to Realizing Dependently Typed Logic Programming, and describes a technique that we developed to implement logic programming in LF via translation to λProlog, a higher order logic programming with an efficient bytecode based implementation. The tool, named Parinati, is also available under the GPL version 3.

Other than that I've been snowboarding and working on a fun little side project, which will be to topic of my next update, which I am sure you eagerly await.

Monday, January 25, 2010

System Eff

So, last time I mentioned that the first example was one "that cannot be handled under Hindley-Milner style inference". So the question is, "Ummm... what?" Let's look at the same code, but in an ML-like syntax:

let twice f x = f (f x)
let pair x = (x, x)

The types inferred for these lovely functions are:

val twice : ('a -> 'a) -> 'a -> 'a
val pair : 'a -> ('a * 'a)

So, what does the type of twice mean? Well, each 'a stands for a universally quantified variable, and in particular one that is quantified at the top level. So it means that, "for all types x, twice takes a function from x to x, and an item of type x, and returns something of type x". But is this really what we "meant" when we wrote out the definition of twice?

Maybe not. When we try to use it with pair we are treating it as though it really had the following type: "for all types x, twice takes a function from any type to the same type, and an item of type x, and returns an x". That is, we want twice to take, as its first argument, a universally quantified type. So, if we throw in some new syntax, we want this type:

val twice : (forall 'y. 'y -> 'y) -> 'x -> 'x.

We want to be able to introduce a universal quantification within the type of twice, and not just at the top level. And this is the problem: in our system of choice, this is not possible.

Now, there are of course systems in which this sort of quantification is perfectly legal; System F, for example, allows all this and more, for just 27 payments of $34.95! But, then again, type inference in System F is undecidable, which might burst your balloon.

Thursday, January 21, 2010

This post is brought to you by the Greek letter...

So, a long time ago I said that a Mix alpha release is coming, and it really is, I swear. Here are some examples of Mix code that the implementation can currently compile correctly.

1: This is an example of some code that cannot be handled under Hindley-Milner style inference. Specifically, notice that in the body of twice we apply f to the type of its own result. This requires that f have type 'a -> 'a, and therefore that twice have type ('a -> 'a) -> a'. Clearly in our application of twice the argument pair doesn't have such a restricted type.

twice(f)
{
    return function(x){ return f(f(x)); };
}

pair(x)
{
  return new Pair(x, x);
}

Int main(String args)
{
  var p = twice(pair)(1);
  return p.First.First;
}

2: This example shows how cool container classes are in Mix, at least to me. I love homogenous containers because they (well, the type system) take care of ensuring I don't do anything too stupid with its elements. However, in a language like C# or Java you must introduce an (often artificial) interface in order to group objects of different type but related functionality in such a container -- otherwise you just have to relegate yourself to containers of object, and live with the lack of type safety. Here Mix infers what operations are possible on all elements of the array. The error arrises because Mix infers that the array possibly contains both integers and strings, the latter of which do not implement the negation operator -operator.

Another note: arrays are enumerable (which just means that they implement GetEnumerator correctly), and enumerables implement ToString by simply concatenating the string representations of all of their elements. So as long as all of the elements of an enumerable implement ToString, the enumerable itself does as well, which is how we can call IO.print on the array.

Int main(String args)
{
  var a = new Array(10);
  for(var i = 0; i < a.Length(); i++)
  {
    a[i] = i;
  }
  a[random(10)] = "Hello!";

  IO.print(a);  //Enumerable.

  a[4] = -a[4]; //Error, strings cannot be negated.
  return 0;
}

3: Here's a funny example in which iterating to a fixed point proves critical. Essentially f gets a type that should be interpreted as "either a pointer to an integer, or a pointer to a pointer to a string, or a pointer to a pointer to a pointer to an integer, or..." Which essentially means that the only operations you can really perform on f are ToString and Get (which gets the value of a pointer).

Int main(String s)
{
  var f = new Ptr(0);
  var g = new Ptr("Foo!");
  
  for(var i = 0; i < 10; i += 1)
  {
    var temp = f;
    f = new Ptr(g);
    g = new Ptr(temp);
  };
  
  IO.print("f: " + f);
  IO.print("g: " + g);
  return 0;
}

I wanted to put a few more up, but it is time to get back to work.

Wednesday, December 30, 2009