Composing: python

Showing posts with label python. Show all posts

Monday, May 04, 2009

Good Zed Shaw talk (in tweets)

(Covers everything from python to music to banks and economic crash.)

Wednesday, October 22, 2008

Interestingly, this, turns out to have been my most popular post evah (I think) ... 276 hits within the last few days.

Is it all those Haskell programmers?

Update : seems to be due to Reddit.

Wednesday, March 12, 2008

Enso is now open-source. :-)

Update : Virgin Maria! and it's all Python too.

Wednesday, March 05, 2008

In light of my current dalliance with Erlang and that recent post on setting up pipelines in a functional stylee (in Python), this comparison of process-based concurrency in Erlang and Stackless Python is very cool.

Thursday, January 24, 2008

This looks like a must-read for me : Monads in Python

... and another Monad Tutorial I should read so I can understand it.

Saturday, January 19, 2008

I promised to explain that little bit of python black-magic I posted a couple of days ago. It may take a couple of posts ...

But to start with, here's some code I just found, that I wrote about 18 months ago, to create a "pipeline" to do various transformations to lines of text. Basically I want to transform wiki markup into html, but I wanted a flexible mechanism which allowed me to add or remove various transformations at runtime so I can rejig the same transformation for slightly different targets. A good example of this occurs in the original SdiDesk, where the production of HTML to be viewed in SdiDesk is *almost* the same as the production of HTML for export to a static site, except for the link tags, who's href property for internal use needs to be "About:Blank" whereas the external site needs the actual URL of the page.

Here's my original code (with a couple of example transformations) ...





class PipeNode :
    """Base class for pipeline nodes. 
       They must implement 
       process() and getName()"""
       
    def process(self, item) :
        return item

    def getName(self) : return "PipeNode"
    
class Pipeline :

    def __init__(self) :
        self.pipe = []
        self.index = 0
        self.item = None

    def addNode(self, n) :
        self.pipe.append(n)

    def load(self, item) :
        self.index = 0
        self.item = item

    def next(self) :
        self.item = self.pipe[self.index].process(self.item)
        self.index = self.index + 1

    def getCurrentItem(self) :
        return self.item

    def getCurrentIndex(self) :
        return self.index

    def getCurrentNode(self) :
        return self.pipe[self.index]

    def eop(self) :
        """End of pipe"""
        return self.index == len(self.pipe)

    def run(self) :
        while (not self.eop() ) :
            self.next()

class BlankLines (PipeNode) :

    def getName(self) : return "blankline-to-para"
    
    def process(self, item) :
        if item == '' :
            return "<p/>"
        else :
            return item

class Bullet (PipeNode) :

    def process(self, item) :
        if item[0] == '*' :
            return '<li>' + item[1:] + '</li>'
        else :
            return item
        
class LinewiseStringPipeline (Pipeline) :

    """
    Based on the generic pipeline, this cuts a long string into lines
    (separated by '\n') and processes them one by one
    """
    
    def __init__(self, sep='\n') :
        Pipeline.__init__(self)
        self.sep = sep

    def runAll(self, s) :
        if len(self.pipe) == 0 :
            return s
        build = ''
        lines = s.split(self.sep)
        for x in lines :
            self.load(x)
            self.run()
            build = build + self.getCurrentItem() + self.sep
        build = build[0:-1]
        return build

# test it
pl = LinewiseStringPipeline(r'\n')
pl.addNode( BlankLines() )
pl.addNode( Bullet() )   

page = """here

is

* some
* data"""

print pl.runAll(page)

You'll see it's all good object oriented stuff. I define a base-class generic "PipeNode" which doesn't really do anything, but acts as a definition of the interface of a section of the pipe. Its important method is process, which takes an input and delivers some kind of output.

Then there's the pipe object itself. A straight-forward list of nodes which knows how to run ie. pull the items through each of the processing nodes. It keeps track of which section of the pipe the item is in using index.

Then there's the LinewiseStringPipeline which knows how to cut a large marked up page into single lines (wiki markup is applied on a line-by-line basis), run them separately through the pipe, and then concatenates them back together at the end.

Finally there are a couple of example PipeNode filters ... one which turns blank lines into blank paragraphs, and * into list-items.

I hope it's comprehensible. It's a little over-engineered with extra methods that are not strictly necessary for the operation. (Such as admin functions I guessed would be useful.)

And it is extremely verbose.

Now compare the same thing in my new, functional style which dispenses with objects and classes but seriously uses generators and closures.



def makeSection(f) :
    def gen(source) :
        for x in source :
            yield f(x)
    return gen

  
def makeFilter(test) :
    def gen(source) :
        for x in source :
            if test(x) : yield x
    return gen

  
def pipeline(generators) :
    def p(source) :
        old = source
        for g in generators :
            new = g(old)
            old = new
        return old
    return p

  
def blankLines(item) :
    if item == '' :
        return "<p/>"
    else :
        return item

  
def bullet(item) :
    if item[0] == '*' :
            return '<li>' + item[1:].strip() + '</li>'
    else :
        return item

        
def linewisePipe(page, pipe) :
     return '\n'.join(pipe(page.split('\n')))
     

# test it
pipe = pipeline([ makeSection(blankLines) , 
                  makeSection(bullet) ])
page = """here

is

* some
* data"""

print linewisePipe(page,pipe)

So, how does it work? Instead of handling each stage of the pipe with an object with only one significant method : process, it now handles each with a single function. Except that function is, in fact, a generator (ie. it's a function who's execution frame doesn't disappear when it yields it's return value, but continues in memory, and gets resumes where it left off).

The reason for this, is that these generators take a in iterator for their argument. And only pull and process the next item from the iterator before yielding it. The use of a generator is important because once it's up and running it's the generator itself which is remembering which item is currently being processed.

Of course, writing generators is a mildly more complicated than writing functions, so I've written some "make" functions to turn ordinary functions into them. makeSection(f) takes an ordinary x -> y function and returns the generator as a closure; note how the actual processing function is bound to f at the call of makeSection.

The result of calling makeSection, then, is a new generator which a) sucks something from an iterator (called "source"), b) calls f on it, c) yields it (ie. returns it, but hangs around with the next item from the iterator in "x", which it will yield next time it's called)

To simply use one of these sections on its own you could do something like this :




s = makeSection(lambda x: x * 2)
for x in s( [1,2,3,4,5] ) :
   print x

Closure s takes an iterator as an argument (in this case, the list [1,2,3,4,5] ). It's result is also an iterator. (Which is what Generators look like from outside)

In fact the "for" loop keyword is implicitly calling next() on it every time to get a new value for x. When it does so, it pulls the next value out of s. Now, inside s, we have the body of "gen" as defined in makeSection. This is applying the function f (which doubles its argument x) on each value that comes through.

So sctions are always generator functions that are started with an iterator as input and at every step of the iteration return the next item from their source iterator transformed. The transformation itself depends on the argument given to the call to makeSection.

Filters are the same, except in this case, they only pass on items that meet the "test" criteria.

The next part of the program is to chain a number of these generators together into a pipeline. In fact, "pipeline" is another function which returns a closure. It takes a list of sections and filters and it's this list which is stored in the closure. (Be careful here, the reference to the list not the actual list content - so there's a danger of changing them after the execution of pipeline)

What comes out of pipeline is a new generator. One which takes an initial iterator as source, and on each next, sucks the next item from that source all the way through all the stages of the pipeline and spits it out.

So what are the advantages of this over my original object oriented version? It's certainly far "cleverer". To me, today, it looks more elegant. And it's much easier to start using. My specific transformation functions don't need to know anything about pipelines - they don't need to be defined as methods on new classes that are explicitly sub-classed from PipeNode. Instead each function can be written in ignorance of it's role, and then turned into a section or filter when needed. (Often just before assembling the pipe.)

Pipes are easy to assemble using the pipeline function, and I believe are very intuitive to use.

Of course, the object version could be tweaked to look more like the functional version to the outside user. The Pipeline object could be an iterator, making it possible to use it in a for loop. We could perhaps make it take a list of sections in it's constructor rather than force the programmer to write multiple explicit addNodes.

Nevertheless, I'm smitten with the new style. I'm replacing the old code with the new today.

Thursday, January 17, 2008

Don't know if I'm fully following this or understand exactly what a Ruby Fiber is ...

But it did inspire me to come up with this little bit of Python black-magic :



def makeSection(f) :
    def gen(source) :
        for x in source :
            yield f(x)
    return gen


def makeFilter(test) :
    def gen(source) :
        for x in source :
            if test(x) : yield x
    return gen

    
def pipeline(generators) :
    def p(source) :
        old = source
        for g in generators :
            new = g(old)
            old = new
        return old
    return p
    
t2 = makeSection(lambda x : x * 2)
t3 = makeSection(lambda x : x * 3 )
f1 = makeFilter(lambda x : (x>20))
f2 = makeFilter(lambda x : (x<50))

pipe = pipeline( [ t2 , t3 , f1 , f2 ] )

for x in pipe(range(10)) : print x

See if you can work it out, I'll probably go over it tomorrow ...

Friday, December 28, 2007

Nabaztag.py

And robot grace

Monday, September 10, 2007

OK, I've just released a new build of GeekWeaver.

Full story over at Smart Disorganized, but basically OPML + domain specific language = cool way of defining complex web-site.

Saturday, September 08, 2007

Maybe ... just maybe ... I'm starting to see what Python decorators are all about.

Have a look at this example :




log1 = []
log2 = []

def declogFactory(l) :
    def declog(f) :
        def new(*args, **kws) :
            l.append((f.__name__,args[0],args[1]))
            return f(*args, **kws)
        return new
    return declog

declog = declogFactory(log1)

@declog
def add(x,y) :
    return x + y

@declog
def times(x,y) :
    return x * y

print add(1,2)
print log1, log2
print times(4,5)
print log1, log2

log1 and log2 are lists representing two alternative logs

declogFactory is actually a routine which is going to return a closure which is a python decorator.

so declogFactory receives a list as an argument and returns a decorator with the variable l bound to that list.

we assign that decorator to "declog" in the main program.

but what is declog?

decorators are functions which transform other functions, and are invoked with the @ syntax before the function they transform.

So this :



@declog
def add(x,y) :
    return x + y

is applying the new decorator "declog" to the function "add".

Let's look inside the definition of the decorator



def declog(f) :
    def new(*args, **kws) :
        l.append((f.__name__,args[0],args[1]))
        return f(*args, **kws)
    return new

this decorator also takes one argument : f that will be the function to be decorated.

we, in turn define function new. All it does is append the name of the function f it receives, with the first two arguments, to the log. Then it calls f.

effectively we've wrapped a call to f inside another routine which updates the log first.

however, we don't need to do this explicitly. You'll see that, although the routines add and times have been decorated, the calls to them look exactly the same as a call to a non-decorated version.

the simple occurance of the decorator line @declog before each definition is sufficient to turn each routine into a logged routine, as we see from the output


3
[('add', 1, 2)] []
20
[('add', 1, 2), ('times', 4, 5)] []

both add and times are decorated.

note, as well, that the decorator was itself parameterized with which log. it's sufficient to change the line defining the decorator to


declog = declogFactory(log2)

to have the logging from all the functions redirected to log2 instead of log1

Tuesday, June 19, 2007

Bruce Eckel hooks Python to Flex .

I wonder if that's the answer to my problem for where to go with desktop applications like SdiDesk.