Category Archives: Blog

Merry Christmas

Thank you to the varied shop staff in London this past week for being without exception polite, enthusiastic and helpful when I asked for help. Especially the girl in [HMV Bond Street][hmv] who spelt [Zappa][zappa] as “zapper” – she was cute. It made tedious shopping joyful.

Let’s do it again next year! Love Dave

[zappa]: http://www.zappa.com/
[hmv]: http://hmv.com

Context managers

I was re-writing the exellent [watchedinstall][watchedinstall] tool and needed to simplify a particularly gnarly chunk of code that required three sub-proceses to be started and then killed after invoking another process. It occurred to me I could make these into context managers.

Previously the code was something like…

start(program1)
try:
start(program2)
except:
stop(program1)
raise

try:
start(program3)
except:
stop(program2)
stop(program1)
raise

try:
mainprogram()
finally:
stop(program3)
stop(program2)
stop(program1)

Of course that could have been written with nested try / except / else / finally blocks as well, which I did start with but found not much shorter while almost incomprehensible.

[With context managers][ctxt] the whole thing was written as…

# from __future__ import with_statement, Python 2.5

with start(program1):
with start(program2):
with start(program3):
mainprogram()

So much more comprehensible! Here’s the implementation of the context manager (using the `contextlib.contextmanager` decorator for a triple word score):

import contextlib
import os
import signal
import subprocess

@contextlib.contextmanager
def start(program_args):
prog = subprocess.Popen(program_args)
if prog.poll(): # Anything other than None or 0 is BAD
raise subprocess.CalledProcessError(prog.returncode, program_args[0])

try:
yield
finally:
if prog.poll() is None:
os.kill(prog.pid, signal.SIGTERM)

For bonus points I might have used [`contexlib.nested()`][ctxtlib] to put the three `start()` calls on one line but then what would I do for the rest of the day?

[watchedinstall]: http://bitbucket.org/ptone/watchedinstall/
[ctxt]: http://docs.python.org/library/stdtypes.html#typecontextmanager
[ctxtlib]: http://docs.python.org/library/contextlib.html

Snow Leopard: a reactionary writes

Things I like about [Mac OS X version 10.6][sl]:

(Mac OS X 10.6 is also known as Snow Leopard, although I dislike Apple’s use of the operating system codename in their publicity material because it leads to conversations where people talk about “Leopard” and “Tiger” and one has to stop for a second to translate those to actual operating system versions and no-one is ever going to refer to [Mac OS X 10.3 as Panther][panther] these days, let alone [10.2 being Jagwire][jaguar] or heaven forbid [Puma][puma] and [Cheetah][cheetah]. What are the chances I’ll have to look up the codename for 10.5 by the time we reach 10.10? Version numbers are not so evocative but are less confusing than codenames. This doesn’t mean I will stop naming hard disks after Mac OS codenames – my desktop has [Veronica, Gershwin, Harmony and Sonata][codenames] connected at the moment, with [Copland][copland] and [Pink][pink] sitting on the shelf as appropriate…)

Things I like about Mac OS X Snow Leopard:

– Apple’s drivers for my Epson all-in-one printer / scanner actually work. Epson’s drivers for the same printer / scanner only worked if you never used the scanner and promised to attend church more often.
– Significantly snappier.
– QuickTime Player’s minimal interface.

Things I dislike about Mac OS X Snow Leopard:

– By default the Finder does not show internal disks on the Desktop.
– The Finder [ignores type / creator codes][typecreator] on files.

Everything else in 10.6 is good. However it strikes me that the de-emphasizing of old-style Mac metadata (type / creator codes) and the default of not showing your computer’s hard drive icon on the desktop are evidence of the triumph of old-school Next-ies within Apple.

I think the decision to cover-up the hierarchical filesystem is a bad thing.

P.S. Wouldn’t it have been awesome if, having released Mac OS X Cheetah, Apple had continued with naming their releases after other famous Hollywood animal actors? Why they stopped naming releases after [disappointing Sylvester Stallone movies][sly] is beyond me – most any version of System 7 could have been named [Lock Up][lockup].

[sl]: http://www.apple.com/macosx/
[leopard]: http://www.apple.com/support/leopard/
[panther]: http://www.apple.com/support/panther/
[jaguar]: http://www.apple.com/support/jaguar/
[puma]: http://en.wikipedia.org/wiki/Mac_OS_X_v10.1
[cheetah]: http://en.wikipedia.org/wiki/Mac_OS_X_v10.0
[codenames]: http://www.mackido.com/CodeNames/MacOSSoftware.html
[copland]: http://lowendmac.com/orchard/05/1108.html
[pink]: http://lowendmac.com/orchard/05/1026.html
[lockup]: http://www.imdb.com/title/tt0097770/
[sly]: http://www.imdb.com/title/tt0118887/
[typecreator]: http://arstechnica.com/staff/fatbits/2009/09/metadata-madness.ars

Serving custom Django admin media in development

I’ve just discovered [Django][django]’s development server always serves admin media. This is tremendously useful because it means you don’t need to configure a static serve view in your project `urls.py` during development.

However what bit me was I wanted to use a customised set of admin media and had configured a view for the `ADMIN_MEDIA_URL` path and was going batty trying to work out why Django was ignoring it. It used to be that as long as you had `DEBUG = False` in `settings.py` then the development server did not try to help serve the admin media automatically.

[Changeset 6075][6075] added a switch to the runserver command for over-riding the admin media directory.

python manage.py runserver –adminmedia /path/to/custom/media

That change was made more than two years ago. It is [right there in the documentation][docs]. A little bit of magic that wasted fifteen minutes of my frantic schedule (except for the fact I do not have a frantic schedule).

[django]: http://www.djangoproject.com/
[6075]: http://code.djangoproject.com/changeset/6075
[docs]: http://docs.djangoproject.com/en/dev/ref/django-admin/#djadminopt—adminmedia

I am very bad at writing tests

… but I _think_ I might be getting a little better.

At least these days when I am writing some script (almost certainly in [Python][python]) I start out by intending to write tests. I usually fail because I haven’t learnt to think in terms of writing code that can be easily tested.

[Mark Pilgrim][pilgrim]’s [Dive Into Python][dive] has great stuff on how to approach a problem by [defining the tests first and gradually filling in the code][divetest] that satisfies the test suite. One day I may be able to work like that, until then I work by writing a concise docstring, then stubbing out the function. Once the function is in a state where it might actually return a meaningful result I can play with it in the Python interpreter and start adding useful [doctests][doctest] to the [docstring][docstring].

What really helps is to break the logic out into tiny pieces where ideally each piece returns the result of transforming the input (which I think is known as a [functional approach][functional]). By doing this I can have tests for most of the code and those functions that have a lot of conditional logic, those functions that are harder to write tests for, will at least be relying on sub-routines that are themselves well tested.

I can dream.

[python]: http://www.python.org/
[pilgrim]: http://diveintomark.org/
[dive]: http://www.diveintopython.org/
[divetest]: http://diveintopython.org/unit_testing/stage_1.html
[doctest]: http://docs.python.org/library/doctest.html
[functional]: http://en.wikipedia.org/wiki/Functional_programming
[docstring]: http://www.python.org/dev/peps/pep-0257/

The hidden depths of Adobe CS4

[Adobe][adobe]’s installers and updaters for the Creative Suite are amazingly bad. The updaters actually create a hidden directory `/Applications/.AdobePatchFiles` and store what I assume are the old versions of the files that get updated. Almost a gigabyte of data on my system!

What the fuck is wrong with [these guys][oobe]?

Not certain which is worse, that the updaters created a folder in `/Applications` that clearly belongs somewhere in `/Library/Application Support` (if it should exist at all) or that they made it hidden.

You can delete it.

[adobe]: http://www.adobe.com/
[oobe]: http://blogs.adobe.com/OOBE/

Crazy Acrobat installers love Python

Looking through the updaters for [Adobe Acrobat][acrobat] 9 for Mac I came across a bunch of scripts written in [Python][python]. My favourte was called `FindAndKill.py`:

#!/usr/bin/python
“””
Search for and kill app.
“””
import os, sys
import commands
import signal

def main():
if len(sys.argv) != 2:
print ‘Missing or too many arguments.’
print ‘One argument and only one argument is required.’
print ‘Pass in the app name to find and kill (i.e. “Safari”).’
return 0

psCmd = ‘/bin/ps -x -c | grep ‘ + sys.argv[1]
st, output = commands.getstatusoutput( psCmd )

if st == 0:
appsToKill = output.split(‘\n’)
for app in appsToKill:
parts = app.split()
killCmd = ‘kill -s 15 ‘ + parts[0]
#print killCmd
os.system( killCmd )

if __name__ == “__main__”:
main()

(You can [download the Acrobat 9.1.3 update][acrobat913] and find this script at `Acrobat 9 Pro Patch.app/Contents/Resources/FindAndKill.py`.)

Was the author not aware of the `killall` command for sending a kill signal to a named process? The [`killall` man page][mankillall] says it appeared in [FreeBSD 2.1, which was released in November 1995][fbsd]. Adobe CS4 was [released about 14 years later][cs4]. How is it Adobe’s product managers approve these things for release?

What is particularly galling about Adobe’s Acrobat 9 updaters is that they seem to re-implement so much of what the Apple installer application does, even down to their use of gzipped cpio archives for the payload.

[acrobat913]: http://www.adobe.com/support/downloads/detail.jsp?ftpID=4538
[acrobat]: http://www.adobe.com/products/acrobatpro/
[python]: http://www.python.org
[mankillall]: http://www.manpagez.com/man/1/killall/
[fbsd]: http://www.freebsd.org/releases/2.1R/announce.html
[cs4]: http://www.adobe.com/aboutadobe/pressroom/pressreleases/200809/092308AdobeCS4Family.html

Migrating a Filemaker database to Django

At work we have several [Filemaker Pro][fmp] databases. I have been slowly working through these, converting them to Web-based applications using [the Django framework][django]. My primary motive is to replace an overly-complicated Filemaker setup running on four Macs with a single 2U rack-mounted server running [Apache][apache] on [FreeBSD][fbsd].

At some point in the process of re-writing each database for use with Django I have needed to convert all the records from Filemaker to Django. There exist good [Python][python] libraries for [talking to Filemaker][pyfmp] but they rely on the XML Web interface, meaning that you need Filemaker running and set to publish the database on the Web while you are running an import.

In my experience [Filemaker’s built-in XML publishing interface][fmpxml] is too slow when you want to migrate tens of thousands of records. During development of a Django-based application I find I frequently need to re-import the records as the new database schema evolves – doing this by communicating with Filemaker is tedious when you want to re-import the data several times a day.

So my approach has been to export the data from Filemaker as XML using [Filemaker’s FMPXMLRESULT][fmpxmlresult] format. The Filemaker databases at work are _old_ (Filemaker 5.5) and perhaps things have improved in more recent versions but Filemaker 5/6 is a very poor XML citizen. When using the FMPDSORESULT format (which has been dropped from more recent versions) it will happily generate invalid XML all over the shop. The FMPXMLRESULT format is better but even then it will emit invalid XML if the original data happens to contain funky characters.

So here is [filemaker.py, a Python module for parsing an XML file produced by exporting to FMPXMLRESULT][dave] format from Filemaker.

To use it you create a sub-class of the `FMPImporter` class and over-ride the `FMPImporter.import_node` method. This method is called for each row of data in the XML file and is passed an XML node instance for the row. You can convert that node to a more useful dictionary where keys are column names and values are the column values. You would then convert the data to your Django model object and save it.

A trivial example:

import filemaker

class MyImporter(filemaker.FMPImporter):
def import_node(self, node):
node_dict = self.format_node(node)
print node[‘RECORDID’], node_dict

importer = MyImporter(datefmt=’%d/%m/%Y’)
filemaker.importfile(‘/path/to/data.xml’, importer=importer)

The `FMPImporter.format_node` method converts values to an appropriate Python type according to the Filemaker column type. Filemaker’s `DATE` and `TIME` types are converted to Python [`datetime.date`][dtdate] and [`datetime.time`][dttime] instances respectively. `NUMBER` types are converted to Python `float` instances. Everything else is left as strings, but you can customize the conversion by over-riding the appropriate methods in your sub-class (see the source for the appropriate method names).

In the case of Filemaker `DATE` values you can pass the `datefmt` argument to your sub-class to specify the date format string. See Python’s [time.strptime documentation][strptime] for the complete list of the format specifiers.

The code uses [Python’s built-in SAX parser][pysax] so that it is efficent when importing huge XML files (the process uses a constant 15 megabytes for any size of data on my Mac running Python 2.5).

Fortunately I haven’t had to deal with Filemaker’s repeating fields so I have no idea how the code works on repeating fields. Please let me know if it works for you. Or not.

[Download filemaker.py][dave]. This code is released under a 2-clause BSD license.

[dave]: http://reliablybroken.com/b/wp-content/uploads/2009/11/filemaker.py
[strptime]: http://docs.python.org/library/time.html#time.strftime
[fmp]: http://www.filemaker.com/
[django]: http://www.djangoproject.com/
[apache]: http://httpd.apache.org/
[fbsd]: http://www.freebsd.org/
[python]: http://www.python.org/
[pyfmp]: http://code.google.com/p/pyfilemaker/
[fmpxml]: http://www.filemaker.com/support/technologies/xml
[fmpxmlresult]: http://www.filemaker.com/help/html/import_export.16.30.html#1029660
[dtdate]: http://docs.python.org/library/datetime.html#date-objects
[dttime]: http://docs.python.org/library/datetime.html#time-objects
[pysax]: http://docs.python.org/library/xml.sax.html

Network users and Mac 10.5 archive and install

When upgrading a Mac from Mac OS X 10.4 (Tiger) to 10.5 (Leopard), remember that network accounts are _not_ included if you do an archive and install and choose to migrate existing users. If a network account had its home folder at `/Users/jbloggs` then it will have been moved to `/Previous Systems.localized/2009-11-06_0346/Users/jbloggs` (although the date portion will be the date that you did your install).

This applies to [network accounts which authenticate against Active Directory and do not have a mobile account][kb].

Why my place of work used to setup Macs with the option for create mobile account at login turned off is a mystery to me.

[kb]: http://docs.info.apple.com/article.html?path=ServerAdmin/10.5/en/c7od45.html

What is wrong with www.saatchi-design.co.uk

[Saatchi & Saatchi Design][ssd] recently updated their Web site and in doing so made some poor choices.

Multiple addresses for the same pages
————————————-

As well as saatchi-design.co.uk, SSD have registered saatchi-design.com (which they prefer to use when emailing – I don’t know why they don’t just use one domain name for everything). Previously anyone visiting [saatchi-design.com][1], [www.saatchi-design.com][2], [saatch-design.co.uk][3] and [www.saatchi-design.co.uk][ssd] was automatically re-directed to [www.saatchi-design.co.uk][ssd]. This encouraged a single address for any page on the site and reduced the chances of duplicate entries in search engine results.

To fix this they need to configure the site to issue the appropriate re-directs to visiting clients. For the Apache server you can achieve this with [mod_rewrite][].

All site content hidden in Flash movies
—————————————

While Adobe Flash is the preferred format for video on the Web, it is a terrible format for the bulk of the content on Saatchi & Saatchi Design’s site. Simple pages with a picture and explanatory text are perfect for HTML, and many of the page transition effects can be achieved with a little JavaScript.

Instead, Google cannot index the site and it does not display on an iPhone.

Bad URLs
——–

A consequence of how the site has been implemented is the lack of a proper URL structure. Instead one arrives at pseudo-unique URLs when navigating the site. In strict terms all the varied content is a single page. The URLs they expose use fragment identifiers leaving the visitor with addresses like [http://www.saatchi-design.co.uk/#/brand-strategy/][4] and [http://www.saatchi-design.co.uk/#/us/purpose/][5] . Although better than nothing, these URLs again ruin Google’s view of the site – the varied pages are treated as a single page.

Broken links
————

The structure of the site changed with the update, which is not surprising considering how much more content there is now. But in changing the structure the site broke all the existing links. Where Google used to return a couple of dozen results for the various pages in the previous site design, now it returns a page where only the first result actually links to the site, the other results returning an entirely unhelpful generic Apache 404 page. [All those broken links!][deadlinks]

What ought to happen is the site should be configured so that out-of-date links are either re-directed to the appropriate page or to the front page when no equivalent page exists. Again one can use mod_rewrite for this.

HTTP caching-hostile resources
——————————

The updated site consists of a single Flash movie, and this in turn fetches picture and text resources from the server as needed, so that a visitor does not need to download the entire site before she can see the first page. However the Web server does not send `Last-Modified` or `ETag` headers with the response. If it did then the client could use them to check if the content has changed since the last request rather than having to fetch the complete response every time.

Using [HTTP with suitable caching and expiry headers][rfc2616] would save bandwidth costs for the site host and visitors. More importantly it would reduce the site load time for returning visitors because many page elements could be served from the browser’s cache rather than having to be re-fetched.

If the site *were* accessible from an iPhone then caching would be a useful technique for improving the visitor’s experience.

[1]: http://saatchi-design.com
[2]: http://www.saatchi-design.com
[3]: http://saatchi-design.co.uk
[4]: http://www.saatchi-design.co.uk/#/brand-strategy/
[5]: http://www.saatchi-design.co.uk/#/us/purpose/
[ssd]: http://www.saatchi-design.co.uk
[rfc2616]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
[mod_rewrite]: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html
[deadlinks]: http://www.google.com/search?q=site%3Awww.saatchi-design.co.uk