Friday, March 22, 2013

Change the Future - one small slice of PyCon US 2013

I'm currently kicking back in Red Hat's Mountain View office (I normally work from the Brisbane office in Australia) after a lovely lunch with some of the local Red Hatters, unwinding a bit and reflecting on an absolutely amazing week at PyCon US 2013 just down the road in Santa Clara.

For me, it started last Wednesday with the Python Language Summit , an at-least-annual-sometimes-biannual get together of the developers of several major Python implementations, including CPython (the reference interpreter), PyPy, Jython and IronPython. Even with a full day, there were still a lot of interesting topics we didn't get to and will be thrashing out on the mailing lists as usual. However, good progress was made on a few of the more controversial items, and there are definitely exciting developments in store for Python 3.4 (due in early 2014, probably shortly after PyCon in Montreal if past history is anything to go by).

Thursday was a real eye-opener for me. While I did have to duck out at one point for a meeting with a couple of the other CPython developers, I spent most of it helping out at the second of the Young Coders tutorials run by Katie Cunningham and Barbara Shaurette. These tutorials were conducted using Raspberry Pi's with rented peripherals, and the kids attending received both the Pi they were using as well as a couple of introductory programming books.

Watching the class, and listening to Katie's and Barbara's feedback on what they need from us in the core substantially changed my perspective on what IDLE can (and, I think, should) become. Roger Serwy (the creator of IdleX, a version of IDLE with various improvements) has now been granted access to the CPython repo to streamline the process of fixing the reference implementation, and we're working on plans to make the behaviour of IDLE more consistent across all currently supported Python versions (including Python 2.7). (Some aspects of this, especially Roger's involvement, are similar to what happened years ago for Python 2.3 when Kurt B. Kaiser, the PSF's treasurer, shepherded the reintegration of the IDLEfork project and its major enhancements to IDLE back into the reference IDLE implementation in the Python standard library).

Friday saw the start of the conference proper, with inspirational keynotes from Jesse Noller (conference chair and PSF board member) on helping to change the future by changing the way we introduce the next generation to the computers that are now an ever-present aspect of our lives, and from Eben Upton (co-founder of the Raspberry Pi foundation), on how the Pi came to be the educational project it is today, and some thoughts on how it might evolve into the future.

Jesse's keynote included the announcement that every attendee (all 2500 of them) would be receiving a free Raspberry Pi, and that any Pi's that attendees didn't want to claim would be redistributed to various educational groups and programs. Not only that, but Jesse also announced http://raspberry.io/, a new site for sharing Raspberry Pi based projects and resources, as well as a "Rasberry Pi Hack Lab" running for the duration of the conference, where attendees could hook their Pi's up to a keyboard and monitor, as well as experiment with various bits and pieces of electronics donated by one of the conference sponsors. Richard Jones also stepped up to run some additional short introductory PyGame tutorials in the lab (he had run a full 3 hour session on PyGame as part of the paid tutorials on the Wednesday and Thursday prior to the conference).

One key personal theme for the conference revolved around the fact that I've volunteered to be Guido's delegate in making the final decisions on how we reshape Python's packaging ecosystem in the lead up to the Python 3.4 release. I'll be writing quite a bit more on that topic over the coming weeks, so here I'll just note that it started with proposing some changes to the Python Enhancement Proposal process at the language summit on the Wednesday, continued through the announcement of the coming setuptools/distribute merger on Thursday, the "packaging and distribution" mini-summit I organised for developers on the Friday night, the "Directions in Packaging" Q&A panel we conducted on the Saturday afternoon, some wonderful discussions with Simeon Franklin on his blog regarding the way the current packaging and distributions issues detract from Python's beginner friendliness and on into various interesting discussions, proposals and development at the sprints in the days following the conference.

Unfortunately, I didn't actually get to meet Simeon in person, even though I had flagged his poster as one I really wanted to go see during the poster session. Instead, I spent that time at the Red Hat booth in the PyCon Jobs Fair.  The Jobs Fair is a wonderful idea from the conference organisers that, along with the Expo Hall, recognises the multi-role nature of PyCon: as a community conference for sharing and learning (through the summits, scheduled talks, lightning talks, poster session, open spaces, paid tutorials, Young Coders sessions, Raspberry Pi hack lab, and sprints), as a way for sponsors to advertise their services to developers (through the Expo Hall and sponsor tutorials) and as a way for sponsors to recruit new developers (through the Jobs Fair). PyCon has long involved elements of all of these things (albeit perhaps not at the scale achieved this year), but having the separate Expo Hall and Jobs Fair helps keep sales and recruitment activity from bleeding into the community parts of the conference, while still giving sponsors a suitable opportunity to connect with the development community.

Both at the Jobs Fair and during the rest of the conference, I was explaining to anyone that was willing to listen what I see as Red Hat's role in bridging the vast gulf between open source software enthusiasts (professionals and amateurs alike) and people for whom software is merely a tool that either helps (hopefully) or hinders (unfortunately far too often) them in spending time on their actual job/project/hobby/etc.

I also spent a lot of time talking to people about my actual day job. I'm the development lead for one of the test systems at Red Hat, and while it is very good at what it does (full stack integration testing from hardware, through the OS and up into application software), it also needs to integrate well with other systems like autotest and OpenStack if we're going to avoid pointlessly reinventing a lot of very complicated wheels. Learning more about what those projects are currently capable of makes it easier for me to prioritize the things we work on, and make suitable choices about Beaker's overall architecture.

At the sprints, in addition to working on CPython and some packaging related questions, I also took the opportunity to catch up with the Mailman 3 developers - the open source world needs an email/web forum gateway that at least isn't actively awful, and the combination of Mailman 3 with the hyperkitty archiver is shaping up to be positively wonderful.


I didn't spend the entire conference weekend talking to people - I actually got to go see a few talks as well. All of the talks I attended were excellent, but some particular personal highlights were Mike Bayer's deep dive into SQL Alchemy's session behaviour, the panel on the Boston Python Workshop and a number of other BPW inspired education and outreach events, Mel Chua's whirlwind tour of educational psychology,  Lynn Root's educational projects for new coders (with accompanying website), Dave Malcolm's follow-up on his efforts with static analysis of all of the CPython extensions in Fedora, and Dave Beazley's ventures into automated home manufacturing of wooden toys (and destruction of laptop hard drives). There were plenty of other talks that looked interesting but I unfortunately didn't get to (one of the few downsides of having so many impromptu hallway conversations). All the PyCon US 2013 talks should be showing up on pyvideo.org as the presenters give the thumbs up, and the presentation slides are also available, so it's worth trawling through the respective lists for the topics that interest you.

In the midst of all that, Van Lindberg (PSF chairman) revealed the first public draft of the redesigned python.org (I was one of the members of the review committee that selected Project Evolution, RevSys and Divio as the drivers of this initial phase of the redesign process), and also announced the successful resolution of the PSF's trademark dispute in the EU.

This was only my second PyCon in North America (I've been to all three Australian PyCons, and attended PyCon India last year) and the first since I joined Red Hat. Meeting old friends from around the world, meeting other Pythonistas that I only knew by reputation or through Twitter and email, and meeting fellow Red Hatters that I had previously only met through IRC and email was a huge amount of fun. Attending the PyLadies charity auction, visiting the Computer History Museum with Guido van Rossum, Ned Deily and Dwayne Litzenberger (from Dropbox), chatting with Stephen Turnbull about promoting the adoption of open source and open source development practices in Japan, and getting to tour a small part of the Googleplex were just a few of the interesting bonus events from the week (and now I have a few days vacation to do the full tourist thing here in SFO).

I'm still on an adrenaline high, and there are at least a dozen different reasons why. If everything above isn't enough, there were a few other exciting developments happening behind the scenes that I can't go into yet. Fortunately, the details of those should become public over the next few weeks so I won't need to contain myself too long.

This week was intense, but awesome. All the organisers, volunteers and sponsors that played a part in bringing it together should be proud :)

Friday, March 15, 2013

Python Language Summit - PyCon US 2013

My notes from the PyCon US 2013 Python Language Summit are up on ReadTheDocs.

Courtesy of Vinay Sajip, I've also found out how to integrate DISQUS comments into my Python Notes pages, so feedback can happen directly over there :)

Wednesday, February 27, 2013

A Sliding Scale of Freedom

Spideroak's launch of Crypton prompted an interesting discussion on Twitter between myself and a few others. This mostly involved some fairly common "open source" versus "free software" objections to the use of the AGPL for the open source project as a marketing tactic to drive sales of commercial licenses for Spideroak. That conversation prompted me to post the following:


Myself, I'm lazy, so I'm a fan of permissive licensing - this blog is CC0, and the open source stuff I write and license entirely myself uses the Simplified BSD License (which only has 2 clauses in it, and is pretty much limited to disclaiming warranties and saying "Hey, I wrote this"). Those license choices accurately reflect the effort I'm prepared to put into enforcing the legal rights I receive by default under current copyright regimes: absolutely none.

However, I'm not dependent on that software or this blog for my livelihood - they're a hobby, something I do because I want to, not because I need to. My lack of concern about these matters is a luxury and a privilege, because I don't need to worry about where my next meal is coming from - I have a stable job for that, with an employer I thoroughly respect and greatly enjoy working for.

Plenty of people and organizations around the world have gained value from my hobby (and will likely gain more in the future), and the pay-off I see personally is purely in terms of immediate enjoyment, long term reputation gain, and the opportunity to meet and become friends with interesting people I would never have encountered otherwise.

That means it saddens me when companies that are making their software freely available to the world are derided for not being open enough when they make the strategic decision to employ a dual licensing model, and also choose to use the GPL or AGPL to create an enforced commons on the open source side, thus making the commercial offering more attractive. They get accused of wanting to "exploit" the developers that might choose to participate in their project, because the sponsoring company controls the copyrights and can issue commercial licenses, while the third party developers "only" get to use (and customise, and redistribute) the software for free.

Being able to categorically deny such accusations is definitely one of the advantages of a "license in = license out" model for a sponsored project, where the original sponsor quickly becomes bound by the same license obligations as everyone else, but dual licensing is still several orders of magnitude better than keeping a solution proprietary.

There are many potential consumers who will consider being able to use software as more important than being able to redistribute it under a more permissive or closed license, and even for those that eventually decide they want a commercial license, dual licensing allows true "try before you buy" evaluation (since even the AGPL doesn't really kick in if you're not making your service available to the general public over the internet). Even the most ardent GPL detractors are also likely happy to use GPL software when it meets their needs, whether that's in the form of an OS (Linux), or cryptographic software (GPG), etc.

The strategic fears that lead many companies taking their hesitant first steps into the open source arena to favour copyleft licenses over permissive ones shouldn't be dismissed lightly. I'm young enough that I only caught the tail end of the proprietary Unix wars (mostly through antiquated platform specific cruft in the CPython code base), but I personally lay a lot of the credit for Linux avoiding the fragmented state of AIX/IRIX/Tru64/HP-UX/Solaris at the feet of the GPL. The legal strength of the GPL means that competitors with no reason to trust each other at the strategic level can still collaborate effectively at a technical level (up to a point, anyway).

The free software world is still a minnow in the overall software development picture, the vast majority of which is still bespoke intranet deployments. Even when those deployments are based on free or open source software, it's hardly likely to be used as a selling point to those customers. Regardless of high profile tech companies like Google and Amazon, the "cloud" is still in its infancy, and it is going to be a long time before many organisations are willing to trust cloud providers with their data. In the meantime, the likes of Microsoft, Oracle and IBM continue to make money hand over fist. Red Hat may be huge by open source company standards, and have some high profile customers, but we still have a long way to go before we're even close to matching the proprietary giants in scale and ubiquity.

The battle to convince people that sharing leads to better software is not over by any means. It still needs to be fought, and fought hard, until paying for proprietary software rather than certified open source software is an unusual aberration rather than the norm that it still is today.

The friendly fire often directed by advocates of permissive licensing against those that choose to enforce an open commons to assuage understandable fears is not helpful in that broader fight. We should be celebrating the fact that another company has taken a step towards open development, rather than lamenting the fact they didn't travel all the way from proprietary to permissive licensing in one flying leap.

Wednesday, November 21, 2012

PyCon India 2012

Inspired by Noufal Ibrahim's recent article on the general state of the Python community in India, I've finally written this belated report on my recent India trip :)

At the end of October, I had the good fortune to attend PyCon India 2012 in Bangalore. Sankarshan Mukhopadhyay (from Red Hat's Pune office) suggested I submit some talk proposals a few months earlier, and I was able to combine a trip to attend the conference with visits to the Red Hat offices in Bangalore and Pune. It's always good to finally get to associate IRC nicks and email addresses with people that you've actually met in person! While Sankarshan unfortunately wasn't able to make it to the conference himself, I did get to meet him when I visited Pune, and Kushal Das and Ramakrishna Reddy (also fellow Red Hatters) took great care of me while I was over there (including a weekend trip out from Pune to see the Ajanta and Ellora caves - well worth the visit, especially if you're from somewhere like Australia with no human-built structures more than a couple of hundred years old!)

While I wasn't one of the keynote speakers (David Mertz gave the Saturday keynote, and Jacob Kaplan-Moss gave an excellent "State of the Python Web" keynote on Sunday), I did give a couple of talks - one on the new features in the recent Python 3.3 release, along with a longer version of the Path Dependent Development talk that I had previously presented at PyCon AU in August. Both seemed to go over reasonably well, and people liked the way Ryan Kelly's "playitagainsam" and "playitagainsam-js" tools allowed me to embed some demonstration code directly in the HTML5 presentation for the Python 3.3 talk.

Aside from giving those two talks, this was a rather different conference for me, as I spent a lot more time in the hallway chatting with people than I have at other Python conferences. It was interesting to see quite a few folks making the assumption that because I'm a core developer, I must be an expert on all things Python, when I'm really a relative novice in many specific application areas. Fortunately, I was able to pass the many web technology related questions on to Jacob, so people were still able to get good answers to their questions, even when I couldn't supply them myself. I also got to hear about some interesting projects people are working on, such as an IVRS utility for mothers to call to find out about required and recommended vaccinations for their newborn children (I alluded to this previously in my post about my perspective on Python's future prospects).

One thing unfortunately missing from the PyCon India schedule was the target experience level for the talks, so I did end up going to a couple of talks that, while interesting and well presented introductions to the topic, didn't actually tell me anything I didn't already know. Avoiding any chance of that outcome is one of the reasons I really like attending "How we are using Python" style talks, and my favourite talk of the conference (aside from Jacob's keynote) was actually the one from Noufal Ibrahim and Anand Chitipothu on rewriting the Wayback Machine's archiving system (The other major reason I like attending such talks is that knowing I played a part, however small, in making these things possible is just plain cool).

While the volunteers involved put in a lot of effort and the conference was well attended and well worth attending, the A/V handling at the conference does still have room for improvement, as the videos linked above indicate. I've sent a few ideas to the organisers about reaching out to the PSF board for assistance and suggestions on that front. Hopefully they'll look into that a bit more for next year, as I think producing high quality talk recordings can act as excellent advertising for tech conferences in subsequent years, but doing that effectively requires a lot of preparation work both before and during the conference. There are some good resources for this now in the Python community at least in Australia and the US, so I'm hopeful that the PSF will be able to play a part in transferring that knowledge and experience to other parts of the world and we'll start seeing more and more Python conferences with recordings of a similar calibre to those from PyCon US and PyCon AU.

Tuesday, October 02, 2012

Python's Future: A Global Perspective

Is Python's future currently at risk? (TLDR: No)


Calvin Spealman recently posted his thoughts on various aspects of where he sees computing in general heading, and his concerns about where Python may fit in that future.

I think his concerns are somewhat valid as far as specific market segments go, but I think they're overstating the case when it comes to "the future of Python", because I think his article takes a very narrow view of the computing field.

Smartphones and tablets are the new desktop (although the desktop won't go away, it will become limited to power users with demands for precision control and complex workflows). Python has long been relatively weak on the desktop when it comes to redistributing applications, due to the need to get the interpreter installed before it can be used. Microsoft's redistribution restrictions on their C runtime has made this all the more difficult when it comes to Windows.

We also made a fairly major misstep when we failed to appropriately advertise the addition of directory and zipfile execution support in Python 2.6 (bundle your code with all its dependencies except Python into a directory or zipfile and add a __main__.py file and the Python interpreter will execute it as if it was a script. With a zip file, you can even add a shebang line to the front and flag it it as executable and a POSIX shell will pass it to Python automatically if you run it directly. I haven't tried it, but the py launcher shipped with 3.3 should also handle such files). While we later went back and added the appropriate notice to the What's New in Python 2.6 documentation, and updated the command line guide in the documentation, this capability still isn't widely known.

The complaints about dynamic language overhead on mobile devices don't hold much water for me. Smartphones now are more powerful than desktops were less than a decade ago, and Mozilla's Boot2Gecko project holds a lot of potential. While battery technology doesn't advance as fast as computing technology, Moore's Law is leveraged in the mobile space to allow more to be done with less power, reproducing the desktop (and server!) trajectories where dynamic languages were initially derided as too slow, until the hardware caught up to get them to the point of being "fast enough".

However, Python's real strengths have long been server side technology, software development by non-programmers and as an embedded scripting engine for trusted plugins (rather than those that need to be strictly isolated from, for example, a core game engine or the host OS). And in those areas, it's still powering ahead.

Widespread adoption requires being taken for granted

Install a Linux distro. Which dynamic language interpreters are pre-installed? If you're using Debian or Fedora, it will be Python and Perl. The presence of those two can pretty much be taken for granted. Ruby probably won't be there, and a standalone Javascript interpreter certainly won't be.

Apple have expressed their support for Python by building tools that rely on it (with, as far as I know, Python being the only dynamic language interpreter shipped as part of Mac OS X. Update: I'm told Apple ship Perl and Ruby as well), and Microsoft ship their Python Tools for Visual Studio bundle. Google, of course, famously chose Python as the only dynamic language supported on their App Engine platform (and they currently employ Guido van Rossum and a number of other Python core developers).

gcc and gdb both let you write plugins, and your language choices are C/C++ or Python (plus Lisp in the gcc case). Many other infrastructure level tools are going the same way. Fedora's infrastructure is almost entirely written in Python, as is OpenStack.

If you're into multimedia development, Python will be a core part of your toolset, and Python is the key open source competitor to proprietary toolsets in the scientific community. The Natural Language Toolkit is a hugely powerful resource for many data mining applications, and Python is entwined deeply into the core of the financial sector as well.

Also, just as many years ago a lot of formal education program switched from C and C++ (or Pascal or Ada, etc) to Java for introductory programming courses, many are now switching to Python, pushing Java into the role of an enterprise language used only for large and complex applications where the development overhead can be justified to some degree. Businesses are getting to the point where they can choose Python as part of their technology base while being assured of a future pool of recruits that already know the language.

Informal education programs are also favouring Python as the first "real world" application language that people are introduced to. OLPC chose Python, as did RaspberryPI. Readability counts.

The Python Africa Tour has attracted quite a bit of interest, and I believe Africa plays host to its first PyCon later this year (in South Africa). Every other continent has now hosted multiple PyCon's each year in different countries and regions.

Only one kind of client

Things are substantially more competitive on the web service front, with Rails and Django going head-to-head, and Node.js attempting to play the "you can use the same language on the frontend and the backend!" card.

As far as Node.js goes, I'm firmly convinced that if Node.js was going to be a hugely popular server side framework, Twisted would have taken over the world by now. Callback based programming is just plain hard for most humans to wrap their heads around (often even harder than threaded programming) - hence the popularity of greenlets and gevent in the Python world, which permit the use of asynchronous IO capabilities with a threading-like programming style. The ongoing efforts around tweaking generator syntax and capabilities in Python core development could legitimately be summarised as "make it possible to write Twisted code in a way that doesn't hurt people's brains quite so much and without relying on the magical stack-switching assembly code needed for greenlets".

In this space, Python's strength really lies in its ability to step away from traditional web technologies. Want to talk over a serial port to a piece of lab equipment or radio modem? Sure, we can do that. Want to talk to telco gear through a custom C extension? Sure, we have a wide range of tools to support that, too, along with some great Asterisk bindings. Python also has many web framework options, like Pyramid and Flask, that let you be more easily be selective in your choice of components than Django does.

This is important, because I just spent the past weekend here at PyCon India. While smartphones are popular amongst the largely urban professionals that make up the web development community and those they regularly associate with, they're still only available to a vanishingly small percentage of the global population. Much of the rest of the world doesn't even have access to a desktop computer let alone a smartphone. What they do have though are ordinary mobile phones (aka cellphones, for any Americans in the audience).

Added to that is the fact that the majority of the world's population is illiterate - they can understand spoken instructions, and are sufficiently numerate to press numbers on a keypad to operate an Interactive Voice Response System, but they won't be operating a smartphone any time soon, even if one was available to them.

The interfaces and language capabilities you need to reach *that* audience look nothing like those you can use to reach the smartphone toting crowd.

And that's before we even get into the potential long term implications of verbal and tactile interfaces like Siri and Baxter.

No reason to relax

All that said, while Python's future is looking very, very bright from where I sit, that's no reason to relax and assume that future is assured. Python is far from perfect, and the same can be said for the ecosystem around us.

Jacob's Sunday keynote at PyCon India spoke about the need for Python's web community to work on embracing the real time web, and lowering the barriers to entry to providing network-based realtime interactivity in Python-based web applications. It's likely any such efforts will require an update to the WSGI standards to support a streaming IO component, in addition to the current request/response model.

Tools like Kivy, that aim to make it easier to write mobile applications in Python are also an important part of extending Python's reach into areas where it is currently weak.

The recent 3.3 release included several elements aimed at making things easier for beginners (especially those on Windows), including improved error messages, an option to modify PATH in the Windows installer and the Python launcher, while the entire Python 3 series is aimed at embracing Unicode as part of the core of the language, allowing it to better reach beyond its original audience of users whose native alphabets could be expressed within the constraints of ASCII or an 8-bit encoding.

3.3 also took some of the first steps in improving the "out of the box" packaging dependency management experience, by integrating virtual environment support and namespace packages (along with making empty __init__.py files optional).

Concurrency is a problem where the overall Python ecosystem has many more options than those provided by the CPython interpreter implementation. We do offer plenty of interesting tools, especially for embarrassingly parallel problems that fit nicely into the concurrent.futures execution model. The GIL does cause problems for particular workloads, and switching to Jython or IronPython to take advantage of the free-threaded JVM and CLR implementations isn't always going to be an option. I've written far more extensively on that topic, though, so I won't repeat that here.

We should also look at ways of making it easier for other languages to interoperate with Python without an intervening C interface. Perhaps Python should ship a pycall script like this one, that makes it easy to invoke Python functions directly in a pipeline or from another application (passing JSON data in via stdin, and receiving JSON data back via stdout). Conversely, better shell integration is always worth exploring.

And, of course, our journey in rebuilding the Unicode infrastructure is ongoing. Python 3.4 is likely to bring improvements in the ability to switch the encoding of a stream "mid-flight", as well as restoring some convenience APIs for the non-Unicode related uses of the encoding and decoding methods in Python 2.

So yes, there are plenty of areas where Python can, and should, and probably will, improve. But we shouldn't lose sight of the fact that many of the problems with Python (like binary distribution, dependency management and concurrency) are problems with software development generally, so there's nowhere for people to go that will magically make those issues disappear (or else they come at the price of losing out on many of Python's other advantages, or committing to a particular platform, or some other downside).

We're a conservative community by nature - we generally don't like blazing trails when it comes to language design. Instead, we're happy to let others rush ahead, letting them figure out where the pitfalls are, while we see what we can learn from their experience and integrate into Python's syntax, standard libraries, or the Python Package Index.

Wednesday, July 11, 2012

Volunteer developed free-threaded cross platform virtual machines?

Since writing my Python 3 Q & A, including some thoughts on why the CPython GIL isn't likely to go away any time soon, I've been pondering the question of free-threaded cross platform virtual machines for dynamic languages. Specifically, I've been trying to think of any examples of such that are driven almost entirely by volunteer based development.

A brief VM survey


The JVM and Dalvik have plenty of full time developers, and the CLR provided by Microsoft not only has full time developers, but also isn't cross platform.
Mono's core development was funded directly by first Ximian, then Novell and now Xamarin, and since the CLR is free-threaded, free-threading support would have been a requirement from the start.

However, if we switch over to the dynamic language specific VM side, the reference implementations for both Python and Ruby use a Global Interpreter Lock to ease maintenance and maximise speed of execution in the original single-threaded scripting use case. This means neither can scale to multiple cores without using either multiple processes and some form of inter-process communications, or else invoking code that doesn't need to hold the interpreter lock (e.g. C extensions for CPython).

Both Python and Ruby have JVM and CLR implementations that are free-threaded (Jython, JRuby, IronPython, IronRuby), since they can take advantage of  the cross platform threading primitives in the underlying corporate sponsored VM.

Rubinius, with Engine Yard's backing, is creating a free-threaded Ruby interpreter in the form of Rubinius 2.0. In my opinion, they've done something smart by avoiding the Win32 API entirely and just writing POSIX code, leaving the task of dealing with Microsoft's idiosyncratic approach to OS interfaces as a problem for the MinGW developers. Unfortunately (from the point of view of this particular problem), CPython long ago adopted the approach of treating Windows as a first class native build target, rather than requiring the use of a third party POSIX compatibility layer.

PyPy is heading in a different direction, focusing on making Software Transactional Memory a viable approach to concurrency in Python, without the well-known data corruption and deadlock pitfalls of thread-based concurrency.

Lua doesn't support native threading in the core VM at all - it just has a couple of GIL hooks that are no-ops by default, but can be overridden to implement a GIL.

Perl 5 supports threads using the subinterpreter model - by default, all state is thread local and you have to take explicit steps to make it visible to other threads. Perl also warns that using threads may lead to segfaults when using non-thread-safe modules.

Parrot (and thus Perl 6) has a rather ambitious concurrency model, but I have no idea how well it works in practice. With Perl 6 still in development, are there any documented production deployments?

Javascript doesn't support full shared memory thread, only Web Worker Threads. Since objects have to be serialised for inter-thread communication, the model is closer to lightweight processes than it is to shared memory threading.

Whither CPython?


CPython doesn't have any full time developers assigned these days - the PSF budget doesn't stretch that far (yet!), and the companies that use Python (including PSF sponsor members) are generally (with a couple of notable exceptions) more interested in paying people to build applications with the versions that exist now rather than paying them directly to build better versions for use in the future. That's not to say companies don't contribute code (we see plenty of corporate contributions in the form of upstream patches from Linux distro vendors like Red Hat and Canonical, as well as major users like CCP Games, and companies have sponsored particular development activities via the PSF, such as Dave Murray's work on email enhancements that landed in 3.3), just that they don't tend to pay developers to think about future directions for Python in general.


Even when the PythonLabs team (IIRC, Guido van Rossum, Tim Peters, Barry Warsaw, Jeremy Hylton, Fred Drake, maybe some others) were being funded by Digital Creations/Zope Corporation:
  • it still wasn't full time for any of them
  • multi-core machines were still rare back then
  • DC/Zope is focused on web applications, which are far more likely to be IO bound than CPU bound
In more recent years, and this is the first of the exceptions I mentioned earlier, we had Google paying Guido to spend 20 hours a week guiding the development of Python 3, but that was all about fixing the Unicode model rather than improving multi-core support.

The other exception was the Google funded Unladen Swallow effort, which aimed to bring an LLVM based JIT to CPython. While that effort did result in many improvements to LLVM, and the creation of an excellent benchmark suite for long running Python processes (much of which is still used by PyPy to this day), it ultimately failed in its original aim.

Formalising and enhancing subinterpreters

Given the high compatibility risks with existing threaded Python code and especially the risk of segfaults in C extensions that come with making CPython truly free-threaded, the Perl 5 subinterpreter model actually looks like the most promising way forward to me. With that approach, all code execution within a given interpreter is still serialised as normal, while a new communication mechanism would allow data to be safely passed between interpreters.

Since it isn't exposed at the Python level, many developers don't realise that CPython already supports the use of subinterpreters to provide some degree of isolation between different pieces of code. The Apache mod_wsgi module uses this feature to provide some isolation between different WSGI applications running on the same Apache instance.

Unfortunately, there are currently quite a few quirks and limitations with this feature, which is why it has never been elevated to a formal part of the language specification and exposed at the Python level. In addition, the GIL is part of the state that is still shared, so exposing the feature as it exists today wouldn't help at all with concurrency goals.

That leads to my personal recommendation to anyone that would like to see better thread-based concurrency support in CPython:
  • Create a CPython fork (either by importing directly from http://hg.python.org/cpython, or by forking the BitBucket mirror).
  • Make the subinterpreter support compatible with the PyGilState APIs (Graham Dumpleton and I will actually be discussing this aspect at PyConAU next month, so I'd suggest talking to Graham before doing anything on this part)
  • Create a two-tiered locking scheme, where each interpreter (including the main interpreter) has a Subinterpreter Lock that is used to protect the main eval loop, while the Global Interpreter Lock remains in place to protect state that is shared between interpreters
  • Use the subinterpreter lock in preference to the GIL to protect most Python code evaluation
  • Design a mechanism for passing objects between interpreters without serialising or copying them. The CLR application domain design may provide some inspiration here.
This is by no means an easy project, but it's the one I see as having the greatest potential for allowing CPython to exploit multiple cores effectively without requiring serialisation of data. I'll also note that whatever mechanism is designed for that last bullet point may potentially translate to efficient communication between local processes via memory mapped files.

But no, I'm not going to write it. Given what I work on (task automation and IO bound web and CLI applications), I don't need it personally or professionally, and it's too big a project to realistically attempt as a hobby/language PR exercise.

If you're interested in funding efforts to make something like this happen for Python 3.4 (likely coming in early-mid 2014), but don't know how to go about finding developers to work on it, then it's worth getting in touch with the PSF board. If you want better thread-based concurrency support in Python and are a Red Hat customer, it also wouldn't hurt to mention it via the appropriate channels :)

Update: Added Javascript to the VM survey.

Tuesday, July 03, 2012

The title of this blog

This article in praise of taking the time for idleness does a good job of articulating some of the reasons behind the title of this blog. I'm very jealous of my idle time - I don't like it when I have things planned in advance night after night, week after week. I want my downtime to just do whatever seems interesting at the time, and I don't function well if I find it necessary to go without it for an extended period.

Being bored and being lazy are widely seen as things to be avoided. However, it all depends on how you look at them.

Boredom is largely a sign of incredible luxury - a time when the world is placing no immediate demands on us, so we have to come up with some means of satisfying our innate desire to be doing something. Being bored means we're not busy obtaining food, or water, or shelter, or defending ourselves (or our food/water/shelter) from attackers, or otherwise pursuing the basic necessities of survival. It's an opportunity to play - maybe to explore (and change!) the world around us, maybe to explore fictional worlds created by others, maybe to create fictional worlds of our own, or to teach others about the real world.

The negative view on being lazy often rests on unstated assumptions (even fears) about the purpose of life: "Make more of yourself!", "Do something with your life!", "Leave your mark on the world!". When you get right down to it though, nobody (and I mean nobody) knows the meaning of life. We don't really know why it's better to get out of bed each morning and face the world - we just choose to believe that life is better than non-life, and engaging with the world is better than ignoring it. We create all sorts of stories we tell ourselves to justify our reasons for rejecting nihilism (to the point of killing each other over our choice of stories), but it ultimately comes down to a decision that the only life we know we have is this one, so we may as well do what can to try and enjoy it while we're here. Once we make that decision, and our basic survival needs are taken care of, everything beyond that point is optional and what we pursue will depend on what we're taught to perceive as valuable.

If you look at the developed world, massive sections of it are aimed at giving people something to do when they're bored because their basic survival needs are taken care of more efficiently than they are by subsistence farming or hunter-gathering. This idle time may be spent creating new things, or consuming those things previously created by others. Some people see efficiency gains as a way to do more work in the same amount of time, but it's equally possible to exploit those gains to do the same amount of work in less time, leaving more time to be idle, and hence bored, and hence looking for other things to do. Is the former choice always better than the latter lazy choice? I don't believe so.

Retreating from the deep philosophical questions and getting back to the more mundane question of the blog title, I do own another domain that redirects to this one, and thus have occasionally tinkered with the idea of rebranding the site as Curious Efficiency. This would put a more traditionally "positive" spin on the concepts of idle investigation and elimination of unnecessary work mentioned in the blurb. However, I find the questions raised by the negative forms more intriguing though, and thus the current title remains. That said, if I ever get around to using my own domain for my primary email address, it will definitely be curiousefficiency rather than boredomandlaziness :)