Tag Archives: python

XPath bug in old versions of ElementTree

I figured out why my XML parsing code works fine using the pure-Python ElementTree XML parsing module but fails when using the speedy and memory-optimized cElementTree XML parsing module.

The XPath 1.0 specification says '.' is short-hand for 'self::node()', selecting a node itself.

Parsing an XML document and selecting the context node with ElementTree in Python 2.5:

>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
>>> doc = "<Root><Example>BUG</Example></Root>"
>>> node1 = ElementTree.fromstring(doc).find('./Example')
>>> node1
<Element Example at 10e0ed8c0>
>>> node1.find('.')
<Element Example at 10e0ed8c0>
>>> node1.find('.') == node1

See how the result of node1.find('.') is the node itself? As it should be.

Parsing an XML document and selecting the context node with cElementTree in Python 2.5:

>>> from xml.etree import cElementTree
>>> doc = "<Root><Example>BUG</Example></Root>"
>>> node2 = cElementTree.fromstring(doc).find('./Example')
>>> node2
<Element 'Example' at 0x10e0e3660>
>>> node2.find('.')
>>> node2.find('.') == node2

Balls. The result of node2.find('.') is None.

However! I have a kludgey work-around that works whether you use ElementTree or cElementTree. Use './' instead of '.':

>>> node1.find('./')
<Element Example at 10e0ed8c0>
>>> node1.find('./') == node1
>>> node2.find('./')
<Element 'Example' at 0x10e0e3660>
>>> node2.find('./') == node2

Kludgey because './' is not a valid XPath expression.

So we are back on track. Also works for Python 2.6 which has the same version of ElementTree.

Fortunately Python 2.7 got a new version of ElementTree and the bug is fixed:

>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
>>> doc = "<Root><Example>BUG</Example></Root>"
>>> node3 = ElementTree.fromstring(doc).find('./Example')
>>> node3
<Element 'Example' at 0x107257210>
>>> node3.find('.')
<Element 'Example' at 0x107257210>
>>> node3.find('.') == node3

However! They also fixed my kludgey work-around:

>>> node3.find('./')
>>> node3.find('./') == node3

So I can’t code something that works for all three versions. This is annoying. I was hoping to just replace ElementTree with the C version, makes my code run in one third the time (the XML parts of it run in one tenth the time). And cannot install any compiled modules – the code can only rely on Python 2.5’s standard library.

Styling your Excel data with xlwt

This post is about how to create styles in Excel spreadsheets with the most excellent xlwt for Python. The documentation for xlwt (version 0.7.2) is a little sketchy on how to use formatting. So here goes…

To apply formatting to a cell you pass an instance of the xlwt.XFStyle class as the fourth argument to the xlwt.Worksheet.write method. The best way to create an instance is to use the xlwt.easyxf helper, which takes a string that specifies the formatting for a cell.

The other thing about using styles is you should only make one instance of each, then pass that same style object every time you want to apply it to a cell.

An example which uses a few styles:

import xlwt

styles = dict(
    bold = 'font: bold 1',
    italic = 'font: italic 1',
    # Wrap text in the cell
    wrap_bold = 'font: bold 1; align: wrap 1;',
    # White text on a blue background
    reversed = 'pattern: pattern solid, fore_color blue; font: color white;',
    # Light orange checkered background
    light_orange_bg = 'pattern: pattern fine_dots, fore_color white, back_color orange;',
    # Heavy borders
    bordered = 'border: top thick, right thick, bottom thick, left thick;',
    # 16 pt red text
    big_red = 'font: height 320, color red;',

I have no idea what it is based on, but 20 = 1 pt. So 320 = 16 pt text.

book = xlwt.Workbook()
sheet = book.add_sheet('Style demo')

for idx, k in enumerate(sorted(styles)):
    style = xlwt.easyxf(styles[k])
    sheet.write(idx, 0, k)
    sheet.write(idx, 1, styles[k], style)


It isn’t included with the current distribution on the cheese shop, but there is a useful Excel spreadsheet demonstrating cell patterns in the source repository.

You can find the complete list of possible cell formats by reading the source for xlwt.Styles.

Class-based views for Bottle

I’m not convinced this is actually a good idea, but I have an approach for using class-based views as handlers for a route with Bottle.

(If you were mad keen on Django’s shift to class-based views you might reckon life wouldn’t be complete with a Bottle-based application until you employ classes for views. However Bottle’s use of decorators for tying URLs to views means it is less a natural fit than the same thing in Django.)

The problem is that you can’t just decorate the method in your class using bottle.route because if you use that decorator on a method in a class you are telling Bottle to use the method before it has been bound to an instance of that class.

So although I wish it did, the following example will not work:

import bottle

class ViewClass(object):
    def home_view(self):
        return "My home page."

obj = ViewClass()

Running that will lead to errors about not enough arguments passed to the view method of your ViewClass instance.

Instead you need to register the route right after the object is created. This can be done in the class’s __new__ method:

import bottle

class ViewClass(object):
    def __new__(cls, *args, **kwargs):
        obj = super(ViewClass, cls).__new__(cls, *args, **kwargs)
        return obj

    def home_view(self):
        return "My home page."

obj = ViewClass()

It works. It isn’t that pretty. You could achieve exactly the same thing by explicitly passing the obj.home_view method to bottle.route after the instance is created. The advantage to doing this in the __new__ method is it will happen automatically whenever ViewClass is instantiated.

And if you go down this path then you should be aware of threads. Hey! Nice threads! Also I have a cold.

Running Django on Mac

These are semi-detailed steps for installing all the bits to host a Django application on Mac OS X. Tested on 10.5, should work perfectly on 10.6.

Use MacPorts: relatively easy to install and the best thing is everything is contained in a directory that you can be confident won’t eff up Apple’s stuff and won’t be effed up by Apple’s stuff.

Install Xcode

You need the compiler and bits that are installed with Xcode. If you can’t find your Mac install discs (Xcode is included with every new Mac but not installed) you can download it from Apple’s developer website. Registration is required but is free.

The current version of Xcode is easy to find, while older versions are available in the downloads section under “Developer Tools”. Xcode version 3.1.4 is the last version that will work for Mac OS X 10.5 systems.

Install MacPorts

MacPorts have a nice pkg installer. You can also build it from source.

curl -O http://distfiles.macports.org/MacPorts/MacPorts-1.9.1-10.5-Leopard.dmg
hdiutil attach MacPorts-1.9.1-10.5-Leopard.dmg
sudo installer -pkg /Volumes/MacPorts-1.9.1/MacPorts-1.9.1.pkg -target /
hdiutil detach /Volumes/MacPorts-1.9.1

If for some reason MacPorts cannot fetch updates you may need to pull updates by hand.

Check your $PATH after installing ports to make sure /opt/local/bin is in there. If it isn’t your can do export PATH=/opt/local/bin:/opt/local/sbin:${PATH} to fix things, and even add taht line to ~/.profile so that bash picks it up every time (assuming you haven’t switched your shell).

Install software

The port command is used to manipulate the MacPorts installation. Use it to build and install the various bits we need. This takes a while, especially on old PowerPC machines. Make it more exciting by adding the --verbose flag. Exciting!

sudo port install python26
sudo port install apache2
sudo port install mysql5-server
sudo port install mod_python26
sudo port install py26-mysql
sudo port install py26-django
sudo port install py26-south

And if you want to hook Django into a network directory then you almost certainly want to use LDAP.

sudo port install py26-ldap

Cool kids these days say use mod_wsgi instead of mod_python for hosting Python applications with Apache, but I am not cool (and on 20 September 2010 I couldn’t persuade mod_wsgi to build from MacPorts on a clean installation).

Configuring and starting MySQL

UPDATED: commenter matea pointed to Jason Rowland’s MySQL on Mac posting that includes steps to secure a default installation, so I’ve updated this section with the appropriate steps.

I always seem to be the only person who cares about non-English visitors… anyway, so I want to have MySQL use UTF8 for everything. Edit the configuration so it does. As root, create a configuration at /opt/local/var/db/mysql5/my.cnf with these lines:

init-connect = 'SET NAMES utf8'
character-set-server = utf8
collation-server = utf8_general_ci

default-character-set = utf8

One thing about the line skip-networking in the configuration file is that it means MySQL will not listen to any network clients, including connections to Instead clients should connect to localhost or they should specify the path to the socket that MySQL uses for communication. If your MySQL “client” is a Django instance running on the same host then that should not be a problem.

Now initialize the database and start the server. (The use of -w in the second line tells launchctl to have the database daemon start at boot. If you don’t want to have MySQL running at boot use -F to force start just this one time instead of every time.)

sudo -u mysql /opt/local/bin/mysql_install_db5
sudo launchctl load -w /Library/LaunchDaemons/org.macports.mysql5.plist

And let’s check that the server is up and configured right.

/opt/local/bin/mysql5 -e "SHOW variables LIKE '%char%'"

You should see a table showing that the character set for the client and server is set to utf8.

Now run the secure installation script for MySQL. This will ask you to set a password for MySQL’s root account (the administrator) and ask whether to remove the test database and anonymous user access (you should do both):


Thaz better.

Configuring Postgresql instead of MySQL

If you want to use Postgres instead of MySQL then you need a couple different packages out of ports.

sudo port install postgresql84-server
sudo port install py26-psycopg2

Did you know Apple’s management tools use Postgres? Is true.

Configuring Apache to serve a Django project

Let’s suppose your Django project lives under /Library/WebServer/example.com/myproj, and the project’s settings file is /Library/WebServer/example.com/myproj/settings.py. Here’s how to configure Apache with mod_python to serve your project.

Create a separate site configuration for Apache in /Library/WebServer/example.com/site.conf.

<Location "/">
    SetHandler python-program
    PythonHandler django.core.handlers.modpython
    SetEnv DJANGO_SETTINGS_MODULE myproj.settings
    PythonOption django.root /
    PythonDebug On
    PythonPath "['/Library/WebServer/example.com'] + sys.path"

<Directory /Library/WebServer/example.com>
    Order deny,allow
    Allow from all

Of course once everything is hunky dory you will go back and edit the site configuration so that PythonDebug Off.

And finally tell Apache to use mod_python and read the site configuration. Edit /opt/local/apache2/conf/httpd.conf and add a line at the end of the modules like:

LoadModule python_module modules/mod_python.so

And then a line like:

Include /Library/WebServer/example.com/site.conf

Now fire up Apache:

sudo launchctl load -w /Library/LaunchDaemons/org.macports.apache2.plist

MacPorts has a convenient shortcut for this:

sudo port load apache2

You also want to save Apache a little grief by pre-compiling the Python source files for the project:

/opt/local/bin/python2.6 -m compileall /Library/WebServer/example.com

Hope this helps.

Bottle’s view decorator and default variables

Bottle‘s @view decorator provides a simple way to designate a template to render an HTML page. Your view function just has to return a dictionary, and its contents can be accessed from the template using the '{{ name }}' syntax.

The @view decorator can also take keyword arguments. These are treated as default template variables – if the dictionary returned by your view function doesn’t have a key for one of the keyword arguments then the template will use the value passed into the decorator, like so:

from bottle import view

@view('default.html', author='David Buxton')
def home():
    return {'title': 'Home page'}

That would render any instance of '{{ author }}' as 'David Buxton'. And then you can have another view function that overrides the keywords by returning a different value in the dictionary:

from bottle import view

@view('default.html', author='David Buxton')
def music():
    return {'title': 'Thalassocracy', 'author': 'Frank Black'}

And at that point I wonder what is the advantage of using keyword arguments with @view: you have to decorate each function separately, and if you want to override a keyword in your return dictionary then it would be easier not to specify the keyword in the first place.

Thus the real point of using keywords with the @view function is only apparent if you curry the @view decorator with keywords first so that you can re-use the curried decorator and avoid repeating yourself.

Someday I will re-write the previous sentence. Until then, sorry.

Instead of passing a default author each time as in the examples above, let’s make a new @view decorator (using Python’s functools module) and then use that on each view function:

import functools
from bottle import view

view = functools.partial(view, author='David Buxton')

def home():
    return {'title': 'Home page'}

def music():
    return {'title': 'Thalassocracy', 'author': 'Frank Black'}

The new decorator means you get the default keyword arguments wherever you use @view while permitting any function to override those defaults in the dictionary it returns.

And if you wanted to get really lazy you could even pass in a template name when wrapping the decorator with functools.partial, however you would not be able to use your wrapped decorator to change the template name because it is a positional argument (like what it explains here in the functools documentation). You would also have to call the decorator with no arguments like '@defaultview()'. So forget I mentioned it.

I’m not saying you are lazy.

Django-style routing for Bottle

Bottle provides the @route decorator to associate URL paths with view functions. This is very convenient, but if you are a Django-reject like me then you may prefer having all your URLs defined in one place, the advantage being it is easy to see at a glance all the different URLs your application will match.

Updated: I have re-written this post and the example to make it simpler following Marcel Hellkamp’s comments (Marcel is the primary author of Bottle). My original example was needlessly complicated.

It is possible to have a Django-style urlpatterns stanza with a Bottle app. Here’s how it can work:

from bottle import route

# Assuming your *_page view functions are defined above somewhere
urlpatterns = (
    # (path, func, name)
    ('/', home_page, 'home'),
    ('/about', about_page, 'about'),
    ('/contact', contact_page, 'contact'),

for path, func, name in urlpatterns:
    route(path, name=name)(func)

Here we run through a list where each item is a triple of URL path, view function and a name for the route. For each we simply call the route method and then invoke it with the function object. Not as flexible as using the decorator on a function (because the @route decorator can take additional keyword arguments) but at least you can have all the routes in one place at the end of the module.

Then again if you have so many routes that you need to keep them in a pretty list you probably aren’t writing the simple application that Bottle was intended for.

(This was tested with Bottle’s 0.8 and 0.9-dev branches.)

More Python features that I really like

Another thing that makes using Python pleasing is decorators. A decorator is a wrapper for a function (or method) that takes a function (or method) as an argument and returns a new function (or…) which is then bound to the name for the original function.

The newly-decorated function can then do things like checking the called arguments before invoking the original un-decorated function.

Django provides decorators for authentication so that you can wrap a view function with a check for client credentials before deciding whether to return the original response or a deny access.

In this manner Django’s authentication decorators encourage orthogonal code: the logic for displaying a view is separated from the logic for deciding whether you should be permitted to see the view’s output. By keeping them separate, it becomes simpler to re-use the authentication logic and apply it to other views.

Suppose you have a view that accepts a Django request object and checks whether the user is signed in:

def administration_page(request):
    if request.user.is_authenticated():
        return HttpResponse("Welcome, dear user.")
        return HttpResponseRedirect("/signin/")

With a decorator you can simplify and clarify things:

def administration_page(request):
    return HttpResponse("Welcome, dear user.")

For older versions of Python (pre 2.4) which don’t understand the @ operator one must explicitly decorate the view function like so:

def administration_page(request):
    return HttpResponse("Welcome, dear administrator.")

administration_page = login_required(administration_page)

Note in the example that the original administration_page function is passed to the decorator. The @ syntax in the first example makes that implicit but the two are equivalent.

The implementation of a decorator is interesting. It takes the function itself as an argument and returns a new function which does the actual checking. Here is how the decorator used above might do its stuff:

def login_required(view_function):
    def decorated_function(request):
        if request.user.is_authenticated():
            return view_function(request)
            return HttpResponseRedirect("/signin/")

    return decorated_function

The actual implementation of Django’s login_required decorator is considerably less idiotic. Python’s functools module has helpers for writing well-behaved decorators.

Because functions in Python are themselves objects the decorator can accept a function reference, construct a new function that checks for authentication and then return a reference to that new function.


(Simples gets less simples when you want to write a decorator that accepts configuration arguments because you then need either another layer of nested function definitions or a class whose instances can be called directly, but I’m going to ignore you for a bit and wow is that Concorde…?)

Split a file on any character in Python

I need to split a big text file on a certain character. I expect I am being thick about this, but split doesn’t quite do what I want because it includes the matching line, whereas I want to split right on the matching character.

My Python answer:

def readlines(filename, endings, chunksize=4096):
    """Returns a generator that splits on lines in a file with the given
    line = ''
    while True:        
        buf = filename.read(chunksize)
        if not buf:
            yield line

        line = line + buf

        while endings in line:
            idx = line.index(endings) + len(endings)
            yield line[:idx]
            line = line[idx:]

if __name__ == "__main__":
    import sys, os

    FORMFEED = chr(12) # ASCII 12
    basename = os.path.basename(sys.argv[1])
    for num, data in enumerate(readlines(open(sys.argv[1]), endings=FORMFEED)):
        filename = basename + '-' + str(num)
        open(filename, 'wb').write(data)

This is also useful when reading data exported from some old-fashioned Mac application like Filemaker 5 where the line-endings are ASCII 13 not ASCII 10.

This post was inspired by Lotus Notes version 8.5, which is so advanced that to save a message in a file on disk you have to export it as structured text. And if you want to save a whole bunch of messages as individual files you must forget that drag-and-drop was introduced with System 7, that would be too obvious.

Django AdminForm objects and templates

I can’t find documentation for the context of a Django admin template. In particular, where is the form and how does one access the fields? This post describes the template context for a generic admin model for Django 1.1.

Django uses an instance of ModelAdmin (defined in django.contrib.admin.options) to handle the request for a model object add / change view in the admin site. ModelAdmin.add_view and ModelAdmin.change_view are responsible for populating the template context when rendering the add object and change object pages respectively.

Here are the keys common to add and change views:

  • title, ‘Add ‘ or ‘Change ‘ + your model class’ _meta.verbose_name
  • adminform is an instance of AdminForm
  • is_popup, a boolean which is true when _popup is passed as a request parameter
  • media is an instance of django.forms.Media
  • inline_admin_formsets is a list of InlineAdminFormSet objects
  • errors is an instance of AdminErrorList
  • root_path is the root_path attribute of the AdminSite object
  • app_label is your model class’ _meta.app_label attribute

The way that Django renders a form in the admin view is to iterate over the adminform instance and then iterate over each FieldSet which in turn yield AdminField instances. All I want to do is layout the form fields, ignoring the fieldset groupings which may or may not be defined in the model’s ModelAdmin.fieldset attribute.

This turns out to be easy once you know how. The regular form is an attribute of the adminform object. So if your model has a field named “king_of_pop” you can refer to the form field in your template like so:

{{ adminform.form.king_of_pop.label_tag }}: {{ adminform.form.king_of_pop }}

Or if you want to save your finger tips you can use the with template tag:

{% with adminform.form as f %}
{{ f.king_of_pop.label_tag }}: {{ f.king_of_pop }}
{% endwith %}

Delving through the Django source while I tried to understand all of this I was struck by how Python defines hook functions for iteration and accessing attributes. Half of Python’s attraction is in how easy it is from the program author’s point of view to treat objects as built-in types like lists, dicts, etc.; the other half is the responsibility of the author of a Python module to encourage that same ease of use by implementing the related iteration protocols. It is harder to write a good Python module than it is to write a good Python program that uses a good module.

Using MacPorts behind a firewall

I failed to persuade MySQLdb to build on a Mac OS X Server 10.5.8 install using the system Python + MySQL installation. So I turned to MacPorts where I know I can get Django + all the bits working without much hassle (but with much patience).

The next problem was that MacPorts couldn’t update because rsync was blocked by the corporate access policy. Fortunately plain HTTP is permitted outbound. Here’s how to use a local ports tree.

Install MacPorts using the disk image for 10.5.

curl -O http://distfiles.macports.org/MacPorts/MacPorts-1.8.2-10.5-Leopard.dmg
hdiutil attach MacPorts-1.8.2-10.5-Leopard.dmg
sudo installer -pkg /Volumes/MacPorts-1.8.2/MacPorts-1.8.2.pkg -target /
hdiutil detach /Volumes/MacPorts-1.8.2

If the MacPorts install directories are not in your $PATH environment, you can add them to your .profile. This change will not take effect until you start a new terminal session.

(Updated to keep variables as-is as suggested by commenter Bruce).

cat >> ~/.profile <<\EOF

After you have installed MacPorts, create a directory for the ports tree and check it out using Subversion.

sudo mkdir -p /opt/local/var/macports/sources/svn.macports.org/trunk/dports
cd /opt/local/var/macports/sources/svn.macports.org/trunk/dports
sudo svn co http://svn.macports.org/repository/macports/trunk/dports/ .

N.B. In the last line beginning svn co ... the trailing directory separator is significant!

Now tell MacPorts to use the local checkout rather than rsync. Edit /opt/local/etc/macports/sources.conf and add a new line to the end with the path to the ports tree, then comment out the previous line that uses rsync. Here are the last lines from my configuration:

#rsync://rsync.macports.org/release/ports/ [default]
file:///opt/local/var/macports/sources/svn.macports.org/trunk/dports/ [default]

Finally you must create an index for the tree (otherwise you will see messages saying “Warning: No index(es) found!”).

cd /opt/local/var/macports/sources/svn.macports.org/trunk/dports
sudo portindex

Now go do great things.