Sunday, February 1, 2009

Why CPython Will Live On

Recently there has been a lot of interest on proggit and Hacker News in creating new language implementations on top of existing VMs like the JVM, the CLR, and the Erlang VM Beam. The list of language implementations targeting existing VMs that I can name off the top of my head is long: Clojure, Scala, Jython, JRuby, IronRuby, IronPython, Reia, Ioke, Boo, Fan, F#, and Fortress. I was even working on a small language side-project that had the eventually had the goal of targetting the JVM. This should all be old news to you if you've been paying attention to PL news, and I think it's a pretty good idea. When languages share runtimes, you end up being able to communicate between them nicely, and everyone can collaborate on writing one high-performance garbage collector and one solid JIT.

However, you can only stretch this principle so far. Reia is implemented on top of Beam because it wants capabilities that the JVM doesn't have built-in, like lightweight processes and good fault-tolerant message passing. So what I want to talk about is that while I think Jython and IronPython are a worthwhile ways to get pure Python to play nice with languages on those respective VMx, I still think CPython has a very bright future.

I realized while reading Guido's History of Python blog that one of Python's very early design decisions was to integrate well with existing systems, meaning things written in C. As Guido explains, this was a reaction on his part to his work with the ABC group, which wanted to hide all those scary systems problems away from the programmer and isolate them on some higher-level plane. While this may be good for a learning language, this limits your ability to do interesting things with code that already exists. Python solved that problem by having a relatively simple C API, especially when compared to things like JNI. Going further, the choices to use the GIL and reference counting are decisions that clearly make the life of the C extension module writer easier.

Writing C extension modules isn't exactly peaches and cream, so Greg Ewing came up with Pyrex which was forked into Cython. Cython is a "medium"-level Python-like language which gives you access to C primitives and allows you to call out into both Python and C with ease. The Sage Project, a project to repackage and combine Python math software, uses it extensively.

To give you an idea of what this lets you do, let's say you're writing a C++ plugin for an existing crummy Windows application and you want to make your life better by embedding Python. All you have to do is call PyInitialize() from your plugin, and then you can start interacting with Python code. Cython makes this even easier, because you can write your DLL stubs in Cython as cdef's and you can just wrap it up as a DLL. Doing this kind of thing is technically possible with the JVM. However, while Cython is easy, the JNI is a pain, and the JVM has a massive startup time penalty as compared to CPython. I'd like to link to a more in depth explanation of this technique, but the person I know who is using it hasn't written it up online yet. If and when it does go up I'll link it.

In conclusion, until the day that C's star sets, CPython will continue to be an incredibly useful tool.