Author Archives: david

Tiny speed-ups for Python code

Here’s a bunch of examples looking at micro-optimisations in Python code.

Testing if a variable matches 1 or none of 2 values

You have a variable and want to test if it is any of 2 values. Should you use a test for membership with a tuple? Or a test for membership with a set? Or just use 2 comparisons with a logical or?

Answer: use in with a tuple.

if value in ('foo', 'bar'):

Caveats: for complex objects calculating the identity might be more expensive than calculating a comparison, or calculating the hash of the object when using a set membership test.

Fastest way to copy a list

Given a list of objects (already in memory), what’s the fastest way to make a second list of the same objects? The built-in list() or a list comprehension?

Answer: use the built-in list().

source = ['foo', 'bar']
dest = list(source)

Checking if any item in a tuple is an empty string

Given a tuple of values, is any of them the empty string? This was prompted by some crazy code that used any() with a generator expression and a test.

Answer: use in, don’t use any() with a generator expression and a test.

if '' in ('foo', 'bar', 'baz'):

Caveats: not sure if this would still hold true for a really long tuple, and maybe depends on the position of the empty string in the tuple (when there is one).

Reading test fixtures from disk versus memory

Is it faster to read a test file from disk or construct it in memory?

Answer: memory.

import io

def test_file_contents():
    source = io.StringIO('foo')

    assert == 'foo'

Updating a dictionary with 1 key and value

What’s the fastest way to set 1 key / value in an existing dictionary?

Answer: use direct assignment, it is a lot faster. Don’t use dict.update().

value = {}
value['foo'] = 'bar'

Caveats: I see code using update for a single key a lot, it makes me go spare. You will miss out on annoying me.

I think it is useful to be aware of what does and does not perform well in the CPython implementation. Given 2 ways to do the same thing, one should prefer the more efficient approach, right? Having said that, sometimes the more efficient approach is harder to read, so may not necessarily be better.

Jinja2 templates and Bottle

Although Bottle’s built-in mini-template language is remarkably useful, I nearly always prefer to use Jinja2 templates because the syntax is very close to Django’s template syntax (which I am more familiar with) and because the Bottle template syntax for filling in blocks from a parent template is a bit limiting (but that’s kind of the point).

Bottle provides a nice jinja2_view decorator that makes it easy to use Jinja2 templates, but it isn’t that obvious how to configure the template environment for auto-escaping and default context, etc.

(The rest of this relates to Bottle version 0.11 and Jinja2 version 2.7.)

Template paths

Bottle’s view decorator takes an optional template_lookup keyword argument. The default is to look in the current working directory and in a ‘views’ directory, i.e. template_lookup=('.', './views/').

You can override the template path like so:

from bottle import jinja2_view, route

@route('/', name='home')
@jinja2_view('home.html', template_lookup=['templates'])
def home():
    return {'title': 'Hello world'}

Which will load templates/home.html.

Most likely you will want to use the same template path for every view, which can be done by wrapping jinja2_view:

import functools
from bottle import jinja2_view, route

view = functools.partial(jinja2_view, template_lookup=['templates'])

@route('/', name='home')
def home():
    return {'title': 'Hello world'}

@route('/foo', name='foo')
def foo():
    return {'title': 'Foo'}

That would have loaded templates/home.html and templates/foo.html.

Another way of setting a global template path for the view decorator is to fiddle with Bottle’s global default template path:

from bottle import TEMPLATE_PATH, jinja2_view, route

TEMPLATE_PATH[:] = ['templates']

@route('/', name='home')
def home():
    return {'title': 'Hello world'}

N.B. I used TEMPLATES_PATH[:] to update the global template path directly rather than re-assigning it with TEMPLATE_PATH = ['templates'].

Template defaults

Bottle has a useful url() function to generate urls in your templates using named routes. But it isn’t in the template context by default. You can modify the default context on the Jinja2 template class provided by Bottle:

from bottle import Jinja2Template, url

Jinja2Template.defaults = {
    'url': url,
    'site_name': 'My blog',

Jinja2 version 2.7 by default does not escape variables. This surprises me, but it is easy to configure a template environment to auto-escape variables.

from bottle import Jinja2Template

Jinja2Template.settings = {
    'autoescape': True,

Any of the Jinja2 environment keyword arguments can go in this settings dictionary.

Using your own template environment

Bottle’s template wrappers make a new instance of a Jinja2 template environment for each template (although if two views use the same template then they will share the compiled template and its environment).

You can avoid this duplication of effort by creating the Jinja2 template environment yourself, however this approach means you also need to write your own view decorator to use the custom template environment. No biggie.

Setting up a global Jinja2 template environment to look for templates in a “templates” directory:

from bottle import url
import jinja2

env = jinja2.Environment(
    'url': url,
    'site_name': 'My blog',

You then need a replacement for Bottle’s view decorator that uses the previously configured template environment:

import functools

# Assuming env has already been defined in the module's scope.
def view(template_name):
    def decorator(view_func):
        def wrapper(*args, **kwargs):
            response = view_func(*args, **kwargs)

            if isinstance(response, dict):
                template = env.get_or_select_template(template_name)
                return template.render(**response)
                return response

        return wrapper

    return decorator

@route('/', name='home')
def home():
    return {'title': 'Hello world'}


It’s easy to customize the template environment for Jinja2 with Bottle and keep compatibility with Bottle’s own view decorator, but at some point you may decide it is more efficient to by-pass things and setup a custom Jinja2 environment.

Bottle is nice like that.

Grouping URLs in Django routing

One of the things I liked (and still like) about Django is that request routing is configured with regular expressions. You can capture positional and named parts of the request path, and the request handler will be invoked with the captured strings as positional and/or keyword arguments.

Quite often I find that the URL patterns repeat a lot of the regular expressions with minor variations for different but related view functions. For example, suppose you want CRUD-style URLs for a particular resource, you would write an looking something like:

from django.conf.urls import url, patterns

urlpatterns = patterns('myapp.views',
    url(r'^(?P<slug>[-\w]+)/$', 'detail'),
    url(r'^(?P<slug>[-\w]+)/edit/$', 'edit'),
    url(r'^(?P<slug>[-\w]+)/delete/$', 'delete'),

The detail, edit and delete view functions (defined in myapp.views) all take a slug keyword argument, so one has to repeat that part of the regular expression for each URL.

When you have more complex routing configurations, repeating the (?P<slug>[-\w]+)/ portion of each route can be tedious. Wouldn’t it be nice to declare that a bunch of URL patterns all start with the same capturing pattern and avoid the repetition?

It would be nice.

I want to be able to write an URL pattern that defines a common base pattern that the nested URLs extend:

from django.conf.urls import url, patterns, group
from myapp.views import detail, edit, delete

urlpatterns = patterns('',
        url(r'^$', detail),
        url(r'^edit/$', edit),
        url(r'^delete/$', delete),

Of course there is no group function defined in Django’s django.conf.urls module. But if there were, it would function like Django’s include but act on locally declared URLs instead of a separate module’s patterns.

It happens that this is trivial to implement! Here it is:

from django.conf.urls import url, patterns, RegexURLResolver
from myapp.views import detail, edit, delete

def group(regex, *args):
    return RegexURLResolver(regex, args)

urlpatterns = patterns('',
        url(r'^$', detail),
        url(r'^edit/$', edit),
        url(r'^delete/$', delete),

This way the detail, edit and delete view functions still get invoked with a slug keyword argument, but you don’t have to repeat the common part of the regular expression for every route.

There is a problem: it won’t work if you want to use a module prefix string (the first argument to patterns(...)). You either have to give a full module string, or use the view objects directly. So you can’t do this:

urlpatterns = patterns('myapp.views',
    # Doesn't work.
        url(r'^$', 'detail'),

Personally I don’t think this is much of an issue since I prefer to use the view objects, and if you are using class-based views you will likely be using the view objects anyway.

I don’t know if “group” is a good name for this helper function. Other possibilities: “prefix”, “local”, “prepend”, “buxtonize”. You decide.

Testing with Django, Haystack and Whoosh

The problem: you want to test a Django view for results of a search query, but Haystack will be using your real query index, built from your real database, instead of an index built from your test fixtures.

Turns out you can generalise this for any Haystack back-end by replacing the haystack.backend module with the simple back-end.

from myapp.models import MyModel
from django.test import TestCase
import haystack

class SearchViewTests(TestCase):
    fixtures = ['test-data.json']

    def setUp(self):
        self._haystack_backend = haystack.backend
        haystack.backend = haystack.load_backend('simple')

    def tearDown(self):
        haystack.backend = self._haystack_backend

    def test_search(self):
        results = SearchQuerySet().all()
        assert results.count() == MyModel.objects.count()

My first attempt at this made changes to the project settings and did HAYSTACK_WHOOSH_STORAGE = "ram" which works but was complicated because then you have to re-build the index with the fixtures loaded, except the fixtures don’t get loaded in TestCase.setUpClass, so the choice was to load the fixtures myself or to re-index for each test. And it was specific to the Whoosh back-end of course.

(This is for Django 1.4 and Haystack 1.2.7. In my actual project I get to deploy on Python 2.5. Ain’t I lucky? On a fucking PowerMac G5 running OS X Server 10.5 for fuck sacks.)

Optimizing queries in Haystack results

My Adobe software updates app (which uses Haystack + Django to provide a search feature) has a very inefficient search results template, where for each search result the template links back to the update’s related product page.

The meat of the search results template looks something like this:

{% for result in page.object_list %}
<div class="search-result">
    <a href="{{ result.object.get_absolute_url }}">{{ result.object }}</a>
    <a href="{% url "product_page" result.object.product.slug %}">{{ result.object.product }}</a>
{% endfor %}

The reverse URL lookup triggers a separate SQL query to find the related product object’s slug field for each object in the results list, and that slows down the page response significantly.

For a regular queryset you would tell Django to fetch the related objects in one go when populating the template context in order to avoid the extra queries, but in this case page.object_list is generated by Haystack. So how to tell Haystack to use select_related() for the queryset?

It is easy. When you register a model to be indexed with Haystack for searching, you have to define a SearchIndex model, and you can also override the read_queryset() method that is used by Haystack to get a Django queryset:

from haystack import indexes, site
from myapp.models import MyModel

class MyModelIndex(indexes.SearchIndex):
    # Indexed fields declared here
    def get_model(self):
        return MyModel

    def read_queryset(self):
        return self.model.objects.select_related()

site.register(MyModel, MyModelIndex)

And that solved it for me. Shaves three quarters off the execution time.

PS This all pertains to Django 1.4 and Haystack 1.2.7.

PPS Also pertains to a version of my Adobe software updates page that I haven’t deployed quite yet.

How to fix “ghost” files in the Finder

Sometimes the Mac Finder can get its knickers in a twist about files that you ought to be able to open just fine but Finder says no. You may see a message that says “Item XYZ is used by Mac OS X and can’t be opened.”

This can happen if the Finder is in the middle of a copy and the source disk is suddenly disconnected. During a copy the Finder sets a special type/creator code on files, and when the copy completes the proper type/creator code is restored. But if the copy is interrupted then sometimes instead of recovering gracefully the Finder leaves the files with their temporary attributes.

So if you do have these “ghost” files, here is a bit of command-line magic to remove the Finder’s temporary attributes (you would need to change the “/path/to/files” bit):

mdfind -onlyin /path/to/files -0 "kMDItemFSTypeCode==brok && kMDItemFSCreatorCode==MACS" | xargs -0 -n1 xattr -d

(This is a single command, all on one line, which you type in

What this does is use the Spotlight tool to find only files that have a Mac file type of “brok” and a creator code set to “MACS”. Then for each file we remove the Finder attributes (which means the Finder reverts to using the file-name’s extension when deciding how to open a file).

The -onlyin flag is used to restrict the search to a particular folder and its sub-folders, a belt and braces approach to making sure we only fix the files we are interested in.

This post was suggested by @hobbsy, who had probs with RAW pictures he had copied across disks.

Although the way I showed him how to do this originally was needlessly convoluted because I forgot what one can do with mdfind.

Although the way I showed him how to do this was so much simpler because it was wrapped in a drag-and-drop Mac application created using Sveinbjörn Þórðarson’s Platypus, so no need to type the wrong thing into the shell and accidentally delete everything.

A context manager for files or file-like objects

I usually design my Python programs so that if a program needs to read or write to a file, the functions will take a filename argument that can be either a path string or a file-like object already open for reading / writing.

(I think I picked up this habit from Mark Pilgrim’s Dive Into Python, in particular chapter 10 about scripts and streams.)

This has the great advantage of making tests easier to write. Instead of having to create dummy temporary files on disk I can wrap strings in StringIO() and pass that instead.

But the disadvantage is I then have a bit of boiler-plate at the top of the function:

def read_something(filename):
    # Tedious but not heinous boiler-plate
    if isinstance(filename, basestring):
        filename = open(filename)


The other drawback is that code doesn’t close the file it opened. You could have filename.close() before returning but that will also close file-like objects that were passed in, which may not be what the caller wants. I think the decision whether to close the file belongs to the caller when the argument is a file-like object.

You could set a flag when opening the file, and then close the file afterwards if the flag is set, but that is yet more boiler-plate and quite ugly.

So here is a context manager which behaves like open(). If the argument is a string it handles opening and closing the file cleanly. If the argument is anything else then it just reads the contents.

class open_filename(object):
    """Context manager that opens a filename and closes it on exit, but does
    nothing for file-like objects.
    def __init__(self, filename, *args, **kwargs):
        self.closing = kwargs.pop('closing', False)
        if isinstance(filename, basestring):
            self.fh = open(filename, *args, **kwargs)
            self.closing = True
            self.fh = filename

    def __enter__(self):
        return self.fh

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.closing:

        return False

And then you use it like this:

from io import StringIO

file1 = StringIO(u'The quick brown fox...')
file2 = 'The quick brown fox'

with open_filename(file1) as fh1, open_filename(file2) as fh2:
    foo, bar =,

If you always want the file to be closed on leaving the block you use the closing keyword argument set to True (the default of False means the file will only be closed if it was opened by the context manager).

file1 = StringIO(u'...jumps over the lazy dog.')
assert file1.closed == False

with open_filename(file1, closing=True) as fh:
    foo =

assert file1.closed == True

Today is my brother’s birthday. If I had asked him what he wanted for a present I am pretty certain he would have asked for a blog post about closing files in a computer programming language.

Custom template folders with Flask

Someone was asking on Flask‘s IRC channel #pocoo about sharing templates across more than one app but allowing each app to override the templates (pretty much what Django’s TEMPLATE_DIRS setting is for). One way of doing this would be to customise the Jinja2 template loader.

Here’s a trivial Flask app that searches for templates first in the default folder (‘templates’ in the same folder as the app) and then in an extra folder.

import flask
import jinja2

app = flask.Flask(__name__)
my_loader = jinja2.ChoiceLoader([
app.jinja_loader = my_loader

def home():
    return flask.render_template('home.html')

if __name__ == "__main__":

The only thing special here is creating a new template loader and then assigning it to the jinja_loader attribute on the Flask application. ChoiceLoader will search for a named template in the order of the loaders, stopping on the first match. In this example I re-used the loader that is created by default for an app, which is roughly like FileSystemLoader('/path/to/app/templates'). There are all kinds of other exciting template loaders available.

I really like the fact that Flask and Bottle’s APIs are so similar. Next I want Flask to include Bottle’s template wrapping decorator by default (there’s a recipe in the Flask docs) and for both of them to re-name it @template.

Inspecting your routes in Bottle

Marcel Hellkamp recently added a small feature to Bottle that makes it easy to inspect an application’s routes and determine if a particular route is actually for a mounted sub-application.

(Bottle is a small module written in Python for making websites.)

Route objects (items in the app.routes list) now have extra information when the route was created by mounting one app on another, in the form of a new key mountpoint in route.config.

Here’s a trivial app with another app mounted on it:

import bottle

app1 = bottle.Bottle()

def app1_home(): return "Hello World from App1"

app2 = bottle.Bottle()
def app2_home(): return "Hello World from App2"

app1.mount(prefix='/app2/', app=app2)

And a utility function that returns a generator of prefixes and routes:

def inspect_routes(app):
    for route in app.routes:
        if 'mountpoint' in route.config:
            prefix = route.config['mountpoint']['prefix']
            subapp = route.config['mountpoint']['target']

            for prefixes, route in inspect_routes(subapp):
                yield [prefix] + prefixes, route
            yield [], route

Finally, inspecting all the routes (including mounted sub-apps) for the root Bottle object:

for prefixes, route in inspect_routes(app1):
    abs_prefix = '/'.join(part for p in prefixes for part in p.split('/'))
    print abs_prefix, route.rule, route.method, route.callback

This new feature is sure to revolutionise everything.

SharpZipLib and Mac redux

I wrote a blog about generating Mac-compatible zip files with SharpZipLib, the conclusion of which was to disable Zip64 compatibility. It was wrong, wrong I tell you.

The better solution is to just set the size of each file you add to the archive. That way you can keep Zip64 compatibility and Mac compatibility.

I owe this solution to the excellent SharpZipLib forum, which covered this problem a while ago, but which I missed when I wrote the earlier blog.

Here’s an updated version of the zip tool in C# that makes Macs happy without annoying anyone else:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.Zip;

public class ZipTool
    public static void Main(string[] args)
        if (args.Length != 2) {
            Console.WriteLine("Usage: ziptool <input file> <output file>");

        using (ZipOutputStream zipout = new ZipOutputStream(File.Create(args[1]))) {
            byte[] buffer = new byte[4096];
            string filename = args[0];


            //  Set the size before adding it to the archive, to make your
            //  Mac-loving hippy friends happy.
            ZipEntry entry = new ZipEntry(Path.GetFileName(filename));
            FileInfo info = new FileInfo(filename);
            entry.DateTime = info.LastWriteTime;
            entry.Size = info.Length;

            using (FileStream fs = File.OpenRead(filename)) {
                int sourceBytes;
                do {
                    sourceBytes = fs.Read(buffer, 0, buffer.Length);
                    zipout.Write(buffer, 0, sourceBytes);
                } while (sourceBytes > 0);