Thursday, January 05, 2006

Wow! Last night, the most amazing thing happened to me : I learned to program.

Sorry to everyone I might have misled over the last 20 years or so by pretending I already knew how to program.

It wasn't true. I knew nothing. Really. :-)

Perhaps it's better to say, I figured out how to go up a level and meta-program. Of course, I kind of knew about meta-programming for a while. I'd done examples in Haskell of functions that create other functions (say to multiply their argument by a number which was an argument to a factory function). It looked cute and powerful, but not quite compelling.

But now I've been contemplating Lisp. (Lisp had to be in this story, of course :-) And reading Paul Graham's book.

What he says convinced me that maybe this year I'll try doing a bit of Lisp and teaching it to my students. (Previously we've just done a couple of weeks of Haskell.)

But in order to break them in to this topic, I thought I'd show them some of the ways you can do this kind of meta-programming in Python. (Which I still think is more syntactically friendly and readable for them.)

So here (more or less) is the example I sketched out for them on the board yesterday evening :

def factory(f) :

def g(l) :
return [f(x) for x in l]

return g

def double(x) :
return x * 2

doubleList = factory(double)
halveList = factory( lambda x : x / 2.0)
squareList = factory( lambda x : x * x )

l = range(10)

print doubleList(l)
print halveList(l)
print squareList(l)

The function factory takes a single argument (f) (which is presumed to be a function). It then declares an inner function g which also takes one argument (l) that's presumed to be a list. It then returns a new list using the comprehension : [f(x) for x in l]

Comprehensions are basically a handy Python short-hand for running through each element in a list and doing something with it : basically for mapping or filtering.

The important point, though, is where this function g gets the value of f. It's defined by this call of the function "factory".

At the end of factory we return g. We thus get what I think people call a closure, a combination of function plus context. Or as I put it to the students : a function based on g but customized by the parameters passed to factory.

I make three calls to this factory, each of which creates a new function. One to double every element in the list, one to halve it and one to square it. In the first case, I've created an explicit double() function and passed that as the argument to the factory. In the other two, I've just used lambda expressions to create anonymous functions; which definitely looks neater. (Although I missed out the lambdas for the students yesterday. Can't explode their brains too much in one day. ;-)

I was impressed, and so were some of them. Although I had to wait until I got home to check that this example would actually work. It looked OK but ...

Anyway ... it did work, and it's powerful and neat.

But not quite compelling. For the simple reason that if I want to double or halve a list of numbers in one line of code, Python's list comprehensions already allow me to do this.

For example :

print [(lambda x : x*3)(x) for x in l]

So, last night, I thought I'd try something a bit more interesting. Yesterday I also tought my Object Oriented Java class, and was showing a common example I often use : a simple data-structure to represent a social network.

Essentially, the program defines a Person class. Each Person has name and other details, plus a "friends" ArrayList containing references to other persons. And there are functions for adding and removing friendship and recursively gathering the whole network of all friends-of-friends(of-friends etc) of a person.

Now, I've been writing programs which represent graphs for twenty years. In everything from BBC Basic to Smalltalk to Visual Basic, Java and Python. (There are graphs in Optimaes, SdiDesk and SystemSketch for example)

And I thought I knew how to do it in a reasonable, Object Oriented way.

Which is this : I define a Node class which represents nodes. And an Arc class to represent links between them. And I keep two collections : one of nodes and one of links. In earlier, clunkier times I tried to also keep both nodes and arcs updated about each other. (Eg. nodes know what arcs touch them, arcs know what the nodes at the ends are.) But more recent programs like SdiDesk don't bother having nodes knowing arcs, just the arcs knowing their nodes.

But to simplify matters in my Java social network example, I've got rid of an Arc class altogether, Nodes just have a list of direct references to other nodes. And now I realize this actually looks sufficient for most of my graph needs. And keeps things short and sweet.

But traversing and modifying graphs has always been trickier than it should be. And I've often found my code ugly and hard to read and maintain. Have a look (or rather, don't) at Optimaes which I wrote just after leaving working with Java. It's a huge "framework" sort of approach to graphs.

Anyway, faced with the cuteness of meta-functions I wondered whether there was a way to pull the same trick with graphs. Why not a generic graph-traversing, node-gathering function factory?

So, after a fairly short amount of tweaking. Here's what I came up with.

A simple Node with two fields : object and arcs. The object carries the node's "payload". The arcs it's links to other nodes. You'll notice in the link method that arcs is actually a list of tuples containing reference to the other node, and a type for the link. I've had to define the "contains" function myself, as there doesn't seem to be a native does-list-contain-object function in Python.

class Node :

def __init__(self, o = None) :
self.object = o
self.arcs = []

def link(self, node, linkType="default") :
if not contains(self.arcs, (linkType,node)) :
self.arcs.append( (linkType,node) )
node.arcs.append( (linkType,self) )

Now, normally I'd sit down and write some sort of traversing, printing function to get visualization of the network. But here I'm going to try to do it through meta-programming. So here's my traversal factory.

def graphRunnerFactory(f) :

def g(node, visited) :
for x in node.arcs :
if not contains(visited,x[1]) :
g(x[1], visited)

return g

That's not bad is it?

The factory takes a function f. As before, the work is done in the inner function g which takes a node and a visited list. (The latter to stop the runner getting into an infinite loop going from node A to B and back again.)

Remember that the arcs is a list of tuples of (linkType, node), so x[1] is just pulling out the node from the tuple.

Once again, the work happens when we call f(x[1]). f is whatever function was passed to the factory.

You'll notice, of course, that we can call g recursively.

So now, let's have a graphPrinter :

def printNode(node) :
print node.object

graphPrinter = graphRunnerFactory(printNode)

Now I'm getting excited. From this factory I could make a function to run round and print all the nodes in the graph in three lines.

This is insane!

That's what it's going to take to make new functions to do various bits of housekeeping around the graph : create or initialize the object, serialize, render as XML, draw in a diagram, delete. Actually, I can't even read the Optimaes code anymore, it got too convoluted. But suddenly I think I can have a go at making a massive simplification to that network layer using, basically, the code I've just sketched out. If and when SdiDesk is ported to Python, this is certainly how I'll do the network diagrams. And I'm sure some meta-programming will find its way into SystemSketch (coming real soon now).

Anyway, I've never felt so powerful as a programmer. (OK, I guess "empowered" is better ;-) But I was, hypothetically empowered the moment I picked up Python. I just didn't get it before.) Or so excited about programming. All sorts of projects (yes, even more of them) that I've been avoiding because they seemed like a bit too much hard work, suddenly become feasable. How can I apply this stuff to my experiments with Finite State Machines and language parsers? What about that competitive network forming experiment I wanted to do? Will it accelerate the coming of SystemSketch and the fabled SdiServer?

Anything can happen now :-)

No comments: