Wednesday, March 11, 2009

"Blocks," or "lambdas in a non-sexp language"

Last week there was a lot of discussion around tav's proposal to add Ruby blocks to Python. Eventually, the proposal went to python-ideas, and got ground into the dust by all of the objections. The idea has been proposed before, and has always met with strong resistance. The argument against anonymous pseudo-expression functions relies on the idea that Python already has powerful syntax for doing 90% of the things that Ruby and Scheme use blocks and lambdas for, and Guido seems to prefer it that way. In the end, I think Guido's right about blocks in Python, but in another language with different scoping rules, blocks might be a great idea.

Consider the Ruby blocks examples on the c2 wiki. The first three examples, which are iteration, callback registration, and resource management, all have distinct syntaxes in Python, while in Ruby they all use blocks:

# Ruby:
collection.each do |element|

# Python:
for x in collection:

# Ruby:
numbers = [1,2,3,4]
squares = {|n| n*n }

# Python:
numbers = [1, 2, 3, 4]
squares = [n * n for n in numbers]

Callback registration:
# Ruby:
button.on_click do |event|
...callback code...

# Python:
# (decorators are more general, but this is a common use case
# exemplified by Django filters and tags.)
def raise_dialog(event):
...callback code...

Resource management:
# Ruby: do |file| from file...

# Python:
with open(filename) as file: from file...

The consensus on the python-ideas list was that the dedicated for-loop and context-manager syntax is more readable, because no matter what object you're iterating, you have a big fat keyword on the line start telling you how the next code block is going to be executed, instead of one syntax stretching to try and cover multiple unrelated use cases. The verdict could also be interpreted as another instance of the "there should be only one way to do it" philosophy of Python. Currently, def is the only way to create a function that can contain statements, and decorators cover many of the higher-order function use cases. Introducing another syntax for those tasks goes against the grain.

So if blocks aren't good for Python, where do they work?

First of all, I think blocks in Ruby are kind of broken. I've seen many people talk about the elegance of the Ruby block syntax, and I just don't buy it. Why all the puncuation and magic ampersand-arguments? What the hell is up with optional parentheses on function calls? That's friggin' crazy when you're working with function values. It's almost as bad as Common Lisp having separate namespaces for functions and values. The scoping rules are also crazy. Because there's no variable declarations, you can modify names in enclosing scopes by accident. Python deals with this via the new 'nonlocal' statement. Also, the whole DSL craze and the role of blocks in that is just kind of strange to me. So forget that stuff. What I like about Ruby blocks is that they are an innovative way to do non-neutered lambdas in statement-oriented languages without dangling parenthesis.

Blocks occupy this weird middle zone between functional programming and stateful languages, because in functional languages or Lisps statements are either not allowed or are parentheses-wrapped expressions that you can stick anywhere you want anyway. Blocks are especially relevant in whitespace sensitive languages like Python and Ruby, where jamming a statement into an expression is awkward grammatically. Reia is a good case study for what happens if you try to force statements into expressions. So blocks are a little innovation to move the statements out of the expression and into a following block of code. In Lisp, the trailing parenthesis would be no big deal, but in statement-oriented languages it really messes up your grammar.

So what's the point of this stupid syntax hack so you can write multi-statement lambdas in stateful langauges? I think the reason that new Ruby programmers are so much in awe of blocks is because they haven't been properly exposed to first class functions before. A lot of them are web developers, and aren't interested in those high falutin' ideas about functional programming. I think that the key to teaching someone functional programming is lambda. Without the ability to embed executable code into an expression and pass it off to another function, you're left gesticulating wildly about how functions are values like numbers, strings, and lists. Lambda can really demonstrate that, as they say in 6.001, "the value of a lambda is a procedure." Once you've internalized that idea, you're ready for higher-order functions and the rest.

So while I think that in the existing Python ecosystem it makes sense to not have blocks, it makes it harder to teach and use higher-order functions. There's something to be said for the Ruby way of doing all of those examples above. They all use the same mechanism, and that's another kind of "there should be one obvious way to do it" in action.

In conclusion, if you're designing a new non-functional or whitespace sensitive language and you like the power of lambdas, blocks are probably a good way to express them. Patching them into Python now, however, would probably take away from the simplicity of the language.

Sunday, March 8, 2009

Automatic __repr__ and __eq__ for Data Structure Classes

For the compilers class project that I'm working on, I recently wrote a couple of classes that do simple structural equality and automatic __repr__ generation. We have a lot of simple data structure IR classes in our project, and they all need __eq__ and __repr__ for testing and debugging. Structural equality is easy; all you have to do is introspect on __dict__ and see that the attributes match recursively. Automatic __repr__ is more difficult, though, because to produce valid Python source, you have to know the order of the arguments to the instance's __init__ method. Fortunately, with the inspect module, you can call inspect.getargspec(self.__init__) and get that information. We use the simple convention for our IR nodes that the arguments to __init__ all become attributes of the same name, so you can then use the argument names and getattr to generate the reprs of the subnodes. Good times!

Friday, February 20, 2009

Alternatives to django-media-bundler

Apparently I didn't do a good enough job Googling for other projects when I wrote django-media-bundler. If you Google for django concatenate javsacript or django minify javascript there are a number of other projects that do similar things, each with different tradeoffs:
  • django-mediacat: To use this tool, you descibe your JS packages in Django models using the django-admin interface. You then configure a URL to point at the view function, and pass it the package names you want as GET arguments. mediacat then caches the resulting file in the database model, and sends the appropriate ETags and Last-Modified tags to make the browser cache the request contents. Last updated: 2008-11-02
  • django-compress: This tool can be configured to automatically regenerate its bundles by checking the file modification times of the original source files. Obviously, if you're using a content distribution network or a cookieless static domain, this feature might not work out of the box, but if you're a small-time operation, this is nice. It also allows versioning the filenames of the bundles so that when you have new bundles they will bust the browser cache. You can also use the YUI compression tools if available, and finally django-compress has a templatetag that will source the compressed script if compression is enabled, or the individual source files if disabled. Finally, I'd like to point out that this project looks the most mature, as it was started on 2008-04-28, and it has a well-written wiki and not just a README. Last updated: 2008-12-05
  • django-compact: Looks like it's not quite finished yet, but so far it looks like its main feature is the templatetags that will link to individual script sources or the bundle. Last updated: 2009-01-30
  • django-assets: Has Jinja 2 templatetags, and supports automatic regeneration of bundles based on file mtime. Bundles are defined inline in the templates instead of in a central configuration. This seems less good to me, because you want to keep the number of different bundles small, so that the user only has to download one script bundle. Sometimes you want more than one because a particular page has a lot of JS, but usually you want all JS to be cached after the first page load. Last updated: 2009-02-08
  • django-assetpackager: This tool supports cache busting by putting the bundle generation timestamp in the bundle filename. It also has a templatetag that will source individual scripts in debug mode and just the bundle in production mode. Last updated: 2008-06-21
  • Finally, django-media-bundler: While somewhat unrelated to concatenating and minifying JavaScript and CSS, my project supports image spriting, which is a pain to do by hand. It also employs an interesting little heuristic 2-D bin packing algorithm to try and arrange the images into a square-ish rectangle of minimal area. It has templatetags like a couple of the others, but it doesn't have cache busting. That's an important feature I'd like to add. The auto-regeneration I'm not convinced about, because it breaks down when you're not using a simple single-server setup. Also, it means that your templatetag has to do a bunch of file system calls while its rendering the template, which isn't a terrible idea, but it feels less than perfect. Last updated: 2009-02-15
Having browsed the source trees of each of these projects, in my (biased, of course) opinion the best tools here are django-compress and django-media-bundler. django-compress is mature and has the most JS & CSS bundling features, while the media-bundler is simpler (which can be good) and has image spriting.

Saturday, February 14, 2009

django-media-bundler now supports sprites!

Over the last week I've been working on adding image spriting support to django-media-bundler. For the uninitiated, image spriting is a technique that Google, Yahoo, and other fast web sites use to speed up page load times. What these web sites do is to combine all of their small icon images into one medium size image, and then use CSS background image offsets to display each icon individually from the master. For small icon graphics, the overhead of the HTTP requests dwarfs the size of the actual image, so this speeds things up drastically.

With the help of the Python Imaging Library, I was able to read the images, measure their dimensions, and paste them together into the master image. However, given icons of arbitrary size, it's not clear what is the best way to lay out the master image. Having just taken an advanced algorithms course, this problem seemed like a variation of the bin-packing problem. The bin-packing problem is NP-hard, but once you have a name for something, it's a lot easier to Google up some simple heuristic algorithms to solve the problem.

Finally, I had to figure out how to get the sprites into the page. I found that in audio-enclave we use images in all sorts of interesting ways that make it difficult to abstract away the spriting behind a template tag. In the end I generated a set of CSS rules with the background image and offsets and decided to let the user figure out how to display the images. Working in the sprites was, for our project, more pain than it was worth, and I had to break a couple of nice CSS abstractions to force a DIV node into an element which had none before.

Anyway, implementation details aside, now audio-enclave has excellent front-end performance! Check out these Firebug net tab screenshots:

Before spriting:

After spriting:

Sunday, February 1, 2009

Why CPython Will Live On

Recently there has been a lot of interest on proggit and Hacker News in creating new language implementations on top of existing VMs like the JVM, the CLR, and the Erlang VM Beam. The list of language implementations targeting existing VMs that I can name off the top of my head is long: Clojure, Scala, Jython, JRuby, IronRuby, IronPython, Reia, Ioke, Boo, Fan, F#, and Fortress. I was even working on a small language side-project that had the eventually had the goal of targetting the JVM. This should all be old news to you if you've been paying attention to PL news, and I think it's a pretty good idea. When languages share runtimes, you end up being able to communicate between them nicely, and everyone can collaborate on writing one high-performance garbage collector and one solid JIT.

However, you can only stretch this principle so far. Reia is implemented on top of Beam because it wants capabilities that the JVM doesn't have built-in, like lightweight processes and good fault-tolerant message passing. So what I want to talk about is that while I think Jython and IronPython are a worthwhile ways to get pure Python to play nice with languages on those respective VMx, I still think CPython has a very bright future.

I realized while reading Guido's History of Python blog that one of Python's very early design decisions was to integrate well with existing systems, meaning things written in C. As Guido explains, this was a reaction on his part to his work with the ABC group, which wanted to hide all those scary systems problems away from the programmer and isolate them on some higher-level plane. While this may be good for a learning language, this limits your ability to do interesting things with code that already exists. Python solved that problem by having a relatively simple C API, especially when compared to things like JNI. Going further, the choices to use the GIL and reference counting are decisions that clearly make the life of the C extension module writer easier.

Writing C extension modules isn't exactly peaches and cream, so Greg Ewing came up with Pyrex which was forked into Cython. Cython is a "medium"-level Python-like language which gives you access to C primitives and allows you to call out into both Python and C with ease. The Sage Project, a project to repackage and combine Python math software, uses it extensively.

To give you an idea of what this lets you do, let's say you're writing a C++ plugin for an existing crummy Windows application and you want to make your life better by embedding Python. All you have to do is call PyInitialize() from your plugin, and then you can start interacting with Python code. Cython makes this even easier, because you can write your DLL stubs in Cython as cdef's and you can just wrap it up as a DLL. Doing this kind of thing is technically possible with the JVM. However, while Cython is easy, the JNI is a pain, and the JVM has a massive startup time penalty as compared to CPython. I'd like to link to a more in depth explanation of this technique, but the person I know who is using it hasn't written it up online yet. If and when it does go up I'll link it.

In conclusion, until the day that C's star sets, CPython will continue to be an incredibly useful tool.

Thursday, January 29, 2009

Announcing django-media-bundler

Just a couple of days ago, I looked at the Net tab in Firebug while doing a full refresh of an audio-enclave page. Here's what I saw on our (needlessly) most javascript heavy page:

As you can see, with all those external files, we were flagrantly violating three of the Yahoo best practices: minimize HTTP requests, put scripts at the bottom, and minify JavaScript and CSS. Obviously, what we wanted was to keep developing our JavaScript just as we had in separate modules, and add a build step to our deploy to concatenate and minify our JavaScript and CSS. We had heard of Rails' Asset Packager plugin, so we looked for a Django plugin that did basically the same thing. We were unable to find one, so we wrote our own, dubbing it django-media-bundler, and threw it up on GitHub. Here are the Firebug results of enabling bundling, minification, and deferred JavaScript on that page:

Using django-media-bundler

The media-bundler is a reusable app, so to install it all you have to do is download the source and add it to INSTALLED_APPS. Describe the JavaScript and CSS bundles you would like to create in as explained in the media_bundle.default_settings module. By default, deferring is enabled, and bundling is disabled when settings.DEBUG is True to assist debugging. You can override those values in your settings module.

To source your scripts, instead of writing
<script type="text/javascript" src="/url/myscript.js"></script>
<link rel="stylesheet" type="text/css" href="/url/mystyle.css"/>
you put {% load bundler_tags %} at the top of your template and write
{% javascript "js_bundle_name" "myscript.js" %}

{% css "css_bundle_name" "mystyle.css" %}

At the bottom of your page in your base template you should put
{% deferred_content %}
where-ever you want to load the scripts in production. We recommend putting a second section after your body.

And that's it! As a future goal, we'd like to help automate the horizontal spriting of PNG icons. We would have a {% sprite "sprite_bundle" "icon_name.png" %} tag that automatically generates a div with a background and offset. Alternatively, it might be nice to run the script and CSS files through the template preprocessor to allow them to access the urlresolver so we don't have anymore hardcoded URLs or janky inline template JavaScript. Happy hacking!

Announcing Audio-Enclave

This January during MIT's Independent Activities Period I've been working hard on my dormitory floor's music server software. We operate a rack-mount server that plays music in our showers and in our lounges. The server, dubbed "nice-rack", is a hub where all of us come together and share our tastes in music. Using the web interface we wrote, anyone can upload and play music on the server, and anyone can dequeue anyone else's music. Naturally, the server is a flashpoint for arguments about what constitutes good or even tolerable music.

Recently, the entire web interface was rewritten to use Django, and the backend was rewritten using Gstreamer. The project has reached the point in its life where we, the contributing residents of East Campus Second West, think that other people might find the code useful, so we've put the code under a BSD license and moved it to Google Code:

Audio-enclave should be useful to anyone who wants to share a communal sound system. Personally, I have vague notions of using audio-enclave as an input to my family's living room stereo, so that we can play music from our collection on the stereo without CDs or laptops that have to be on, open, and plugged in.

Eventually, we intend to support some kind of remote API so that we can write little mobile phone apps to control the playback.