Category Archives: Blog

A context manager for files or file-like objects

I usually design my [Python][python] programs so that if a program needs to read or write to a file, the functions will take a *filename* argument that can be either a path string or a file-like object already open for reading / writing.

(I think I picked up this habit from [Mark Pilgrim’s Dive Into Python][dive], in particular [chapter 10 about scripts and streams][chap10].)

This has the great advantage of making tests easier to write. Instead of having to create dummy temporary files on disk I can wrap strings in [`StringIO()`][stringio] and pass that instead.

But the disadvantage is I then have a bit of boiler-plate at the top of the function:

def read_something(filename):
# Tedious but not heinous boiler-plate
if isinstance(filename, basestring):
filename = open(filename)

return filename.read()

The other drawback is that code doesn’t close the file it opened. You could have `filename.close()` before returning but that will also close file-like objects that were passed in, which may not be what the caller wants. I think the decision whether to close the file belongs to the caller when the argument is a file-like object.

You could set a flag when opening the file, and then close the file afterwards if the flag is set, but that is yet more boiler-plate and quite ugly.

So here is [a context manager][ctxt] which behaves like `open()`. If the argument is a string it handles opening and closing the file cleanly. If the argument is anything else then it just reads the contents.

class open_filename(object):
“””Context manager that opens a filename and closes it on exit, but does
nothing for file-like objects.
“””
def __init__(self, filename, *args, **kwargs):
self.closing = kwargs.pop(‘closing’, False)
if isinstance(filename, basestring):
self.fh = open(filename, *args, **kwargs)
self.closing = True
else:
self.fh = filename

def __enter__(self):
return self.fh

def __exit__(self, exc_type, exc_val, exc_tb):
if self.closing:
self.fh.close()

return False

And then you use it like this:

from io import StringIO

file1 = StringIO(u’The quick brown fox…’)
file2 = ‘The quick brown fox’

with open_filename(file1) as fh1, open_filename(file2) as fh2:
foo, bar = fh1.read(), fh2.read()

If you always want the file to be closed on leaving the block you use the *closing* keyword argument set to `True` (the default of `False` means the file will only be closed if it was opened by the context manager).

file1 = StringIO(u’…jumps over the lazy dog.’)
assert file1.closed == False

with open_filename(file1, closing=True) as fh:
foo = fh.read()

assert file1.closed == True

Today is my brother’s birthday. If I had asked him what he wanted for a present I am pretty certain he would have asked for a blog post about closing files in a computer programming language.

[dive]: http://www.diveintopython.net/
[python]: http://www.python.org/
[stringio]: http://docs.python.org/library/io.html
[ctxt]: http://docs.python.org/library/stdtypes.html#typecontextmanager
[chap10]: http://www.diveintopython.net/scripts_and_streams/index.html

Custom template folders with Flask

Someone was asking on [Flask][flask]’s IRC channel [#pocoo][irc] about sharing templates across more than one app but allowing each app to override the templates (pretty much what [Django’s TEMPLATE_DIRS setting][django] is for). One way of doing this would be to customise the Jinja2 template loader.

Here’s a trivial Flask app that searches for templates first in the default folder (‘templates’ in the same folder as the app) and then in an extra folder.

import flask
import jinja2

app = flask.Flask(__name__)
my_loader = jinja2.ChoiceLoader([
app.jinja_loader,
jinja2.FileSystemLoader(‘/path/to/extra/templates’),
])
app.jinja_loader = my_loader

@app.route(‘/’)
def home():
return flask.render_template(‘home.html’)

if __name__ == “__main__”:
app.run()

The only thing special here is creating a new template loader and then assigning it to [the `jinja_loader` attribute on the Flask application][attr]. [`ChoiceLoader`][choice] will search for a named template in the order of the loaders, stopping on the first match. In this example I re-used the loader that is created by default for an app, which is roughly like `FileSystemLoader(‘/path/to/app/templates’)`. There are [all kinds of other exciting template loaders available][loaders].

I really like the fact that Flask and Bottle’s APIs are so similar. Next I want Flask to include [Bottle’s template wrapping decorator][view] by default (there’s [a recipe in the Flask docs][templated]) and for both of them to re-name it `@template`.

[flask]: http://flask.pocoo.org/
[irc]: http://flask.pocoo.org/community/irc/
[attr]: http://flask.pocoo.org/docs/api/#flask.Flask.jinja_loader
[choice]: http://jinja.pocoo.org/docs/api/#jinja2.ChoiceLoader
[loaders]: http://jinja.pocoo.org/docs/api/#loaders
[django]: https://docs.djangoproject.com/en/dev/ref/templates/api/#loading-templates
[templated]: http://flask.pocoo.org/docs/patterns/viewdecorators/#templating-decorator
[view]: http://bottlepy.org/docs/stable/api.html#bottle.view

Inspecting your routes in Bottle

Marcel Hellkamp [recently added a small feature][3] to Bottle that makes it easy to inspect an application’s routes and determine if a particular route is actually for a mounted sub-application.

([Bottle is a small module written in Python for making websites][4].)

Route objects (items in the `app.routes` list) now have extra information when the route was created by mounting one app on another, in the form of a new key `mountpoint` in `route.config`.

Here’s a trivial app with another app mounted on it:

import bottle

app1 = bottle.Bottle()

@app1.route(‘/’)
def app1_home(): return “Hello World from App1”

app2 = bottle.Bottle()
@app2.route(‘/’)
def app2_home(): return “Hello World from App2”

app1.mount(prefix=’/app2/’, app=app2)

And a utility function that returns a generator of prefixes and routes:

def inspect_routes(app):
for route in app.routes:
if ‘mountpoint’ in route.config:
prefix = route.config[‘mountpoint’][‘prefix’]
subapp = route.config[‘mountpoint’][‘target’]

for prefixes, route in inspect_routes(subapp):
yield [prefix] + prefixes, route
else:
yield [], route

Finally, inspecting all the routes (including mounted sub-apps) for the root Bottle object:

for prefixes, route in inspect_routes(app1):
abs_prefix = ‘/’.join(part for p in prefixes for part in p.split(‘/’))
print abs_prefix, route.rule, route.method, route.callback

This new feature is sure to revolutionise everything.

[1]: http://blog.nturn.net/?p=289
[2]: http://jason.cleanstick.net/post/19943282016/stupid-simple-api-reference-for-bottle-py-web-services
[3]: https://github.com/bottlepy/bottle/commit/9b24401605e0470388a65c80a0964cba2bf64caf
[4]: http://bottlepy.org/

SharpZipLib and Mac redux

I wrote a blog about [generating Mac-compatible zip files with SharpZipLib][1], the conclusion of which was to disable Zip64 compatibility. It was wrong, *wrong* I tell you.

The better solution is to just set the size of each file you add to the archive. That way you can keep Zip64 compatibility and Mac compatibility.

I owe this solution to the excellent SharpZipLib forum, [which covered this problem a while ago][2], but which I missed when I wrote the earlier blog.

Here’s an updated version of the zip tool in C# that makes Macs happy without annoying anyone else:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.Zip;

public class ZipTool
{
public static void Main(string[] args)
{
if (args.Length != 2) {
Console.WriteLine(“Usage: ziptool “);
return;
}

using (ZipOutputStream zipout = new ZipOutputStream(File.Create(args[1]))) {
byte[] buffer = new byte[4096];
string filename = args[0];

zipout.SetLevel(9);

// Set the size before adding it to the archive, to make your
// Mac-loving hippy friends happy.
ZipEntry entry = new ZipEntry(Path.GetFileName(filename));
FileInfo info = new FileInfo(filename);
entry.DateTime = info.LastWriteTime;
entry.Size = info.Length;
zipout.PutNextEntry(entry);

using (FileStream fs = File.OpenRead(filename)) {
int sourceBytes;
do {
sourceBytes = fs.Read(buffer, 0, buffer.Length);
zipout.Write(buffer, 0, sourceBytes);
} while (sourceBytes > 0);
}

zipout.Finish();
zipout.Close();
}
}
}

[1]: http://reliablybroken.com/b/2011/11/sharpziplib-and-mac-os-x/
[2]: http://community.sharpdevelop.net/forums/p/4982/18649.aspx#18649

SharpZipLib and Mac OS X

TL;DR When creating zip archives with SharpZipLib disable Zip64 format if you care about Mac compatibility.

Extra TL;DR [When creating zip archives with SharpZipLib make sure you set the file size][redux], disabling Zip64 is neither here nor there.

A project I am working on sources a zip archive from a Web service, extracts the XML file from the zip and then does silly amounts of processing of the data in that XML to produce a new XML file which is returned to the user.

But the sourced zip archive cannot be opened using [Python’s zipfile module][zipfile], and when saved on a Mac the archive cannot be opened using the built-in Archive Utility.app. If one double-clicks the zip, Archive Utility.app just compresses it again and sticks “.cpgz” on the end of the file name.

Fortunately the developer of the Web service is very helpful (the service is written in C# and runs on Windows) and although he didn’t know why Macs were having problems (the built-in Windows zip tool can handle the archive fine) he showed me the code that creates the zip file.

Turns out they were using [SharpZipLib, an open-source library for C#][sharpziplib]. And it turns out SharpZipLib creates archives using [Zip64 format][zip64] by default.

The fix was to disable Zip64 when creating the archive. Here’s a trivial command-line program that creates a zip and disables Zip64 for Mac compatibility:

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.Zip;

public class ZipTool
{
public static void Main(string[] args)
{
if (args.Length != 2) {
Console.WriteLine(“Usage: ziptool “);
return;
}

using (ZipOutputStream zipout = new ZipOutputStream(File.Create(args[1]))) {
byte[] buffer = new byte[4096];
string filename = args[0];

zipout.SetLevel(9);

// Disable Zip64 for Mac compatibility
zipout.UseZip64 = UseZip64.Off;

ZipEntry entry = new ZipEntry(Path.GetFileName(filename));
entry.DateTime = File.GetLastWriteTime(filename);
zipout.PutNextEntry(entry);

using (FileStream fs = File.OpenRead(filename)) {
int sourceBytes;
do {
sourceBytes = fs.Read(buffer, 0, buffer.Length);
zipout.Write(buffer, 0, sourceBytes);
} while (sourceBytes > 0);
}

zipout.Finish();
zipout.Close();
}
}
}

The disadvantage of disabling Zip64 is you cannot create archives larger than 4 gigabytes, nor can you add files larger than 4 gigabytes before compression.

The advantage of disabling Zip64 is you make me and all my Mac-using hippy friends happy. In concrete terms, disabling Zip64 makes it more likely I will buy you a drink.

Hi Gaston!

Also many thanks to the maintainer of NAnt in [MacPorts][macports], who responded to [my bug report][bug] and pushed an updated NAnt to MacPorts extremely quickly. Although SharpZipLib doesn’t officially support [Mono][mono], it builds without a hitch using the “build-net-2.0” target if you hack the `SharpZlib.build` NAnt script like so:




Also thanks to the Mono project! Saved me having to fire up a Windows virtual machine to figure out this problem.

[zipfile]: http://docs.python.org/library/zipfile.html
[sharpziplib]: http://www.icsharpcode.net/opensource/sharpziplib/
[zip64]: http://en.wikipedia.org/wiki/ZIP_(file_format)#ZIP64
[macports]: http://www.macports.org/
[bug]: http://trac.macports.org/ticket/32097
[mono]: http://www.mono-project.com/
[redux]: http://reliablybroken.com/b/2011/12/sharpziplib-and-mac-redux/

Widths & Heights with xlwt + Python

This article about using xlwt to generate Excel in Python reminded me I needed to see exactly how to set column widths (the xlwt documentation doesn’t cover it).

Let’s create a new Excel workbook and add a sheet:

>>> import xlwt
>>> book = xlwt.Workbook(encoding='utf-8')
>>> sheet = book.add_sheet('sheeeeeet')

We need to get a column in order to set its width. You do that by call col() on the sheet, passing the column’s index as the only argument (or row() for accessing rows):

>>> sheet.col(0)    # First column

>>> sheet.row(2)    # Third row

The index is zero-based. You can fetch a column even if you have not written to any cell in that column (this applies equally to rows).

Columns have a property for setting the width. The value is an integer specifying the size measured in 1/256 of the width of the character ‘0’ as it appears in the sheet’s default font. xlwt creates columns with a default width of 2962, roughly equivalent to 11 characters wide.

>>> first_col = sheet.col(0)
>>> first_col.width = 256 * 20              # 20 characters wide (-ish)
>>> first_col.width
5120

For rows, the height is determined by the style applied to the row or any cell in the row. (In fact rows also have a property called height but it doesn’t do what you want.) To set the height of the row itself, create a new style with a font height:

>>> tall_style = xlwt.easyxf('font:height 720;') # 36pt
>>> first_row = sheet.row(0)
>>> first_row.set_style(tall_style)

Setting the style on the row does not change the style of the cells in that row.

There is no obvious way to set a default width and height for all columns and rows. An instance of xlwt.Worksheet.Worksheet has properties for col_default_width and row_default_height but changing those does not actually change the defaults.

The problem is that new columns are always created with an explicit width, while rows take their height from the style information.

My first attempt at setting defaults set the width on every column and the height on every row. It works, but creates 65,536 unnecessary empty row objects.

A slightly better approach is to set the width on every column and to set the font height on the default style record in the workbook:

import itertools
import xlwt

book = xlwt.Workbook(encoding='utf-8')
sheet = book.add_sheet('sheeeeeet')

col_width = 256 * 20                        # 20 characters wide

try:
    for i in itertools.count():
        sheet.col(i).width = col_width
except ValueError:
    pass

default_book_style = book.default_style
default_book_style.font.height = 20 * 36    # 36pt

book.save('example.xls')

Here I used itertools.count() and wrapped the loop in a try block so I can forget exactly how many columns are permitted. When the loop tries to access a bad index it will throw ValueError and the loop will exit.

You mustn’t replace the default style on an instance of xlwt.Workbook.Workbook, you have to update the property of the existing style (to ensure you are changing the first style record). Unfortunately there is no way to set a default column width (as of xlwt version 0.7.2) so the brute force method of setting every column will have to do – it isn’t so bad since there are only 256 columns.

Talking of widths and heights, have you heard "Widths & Heights" by Magic Arm? Is good.

Date variables in InDesign

Interesting [InDesign][indesign] problem: the format for a modification date variable changes per document.

*(This post describes a problem using Adobe InDesign CS4 but applies just as well to CS5 and CS5 and a half.)*

Suppose you have a text frame containing the [file modification date variable][variable], created using *Type* → *Text Variables* → *Insert Variable* → *Modification Date*, displays as “14 September 2011 7:50 PM” (on my system). Now open an existing document that was created on a different system and copy and paste the text frame containing the date variable. But depending on what system created the other document the date displays using a different format, e.g as “September 14, 2011 19:50”.

It appears that the date format is determined by the document in which it is placed, rather than by the date format in use when the variable was created.

Workaround: define a new / custom text variable that uses the file modification time but with your own explicit format, then insert your custom variable instead of the pre-defined “Modification Date” variable.

In my brief testing this custom variable and its format is preserved when pasting the text frame into other documents that were created on other systems.

But now I want to know exactly how the format is chosen for the built-in “Modification Date” variable. I am guessing that when a story with a variable is pasted into a document the format is determined by the format of an existing variable of the same name, and if there is no existing variable of that name then InDesign brings in the new variable definition (along with its format) from the clipboard.

[But why models?][models]

No, that’s not what I mean… why the default date format? I tried changing the date formats in System Preferences. Doesn’t seem that InDesign picks it up from there. I tried trashing the *Adobe InDesign* preferences folder, changing the date format in System Preferences and launching InDesign again. InDesign is still using the original format, so doesn’t get it from the user’s preferences. I had a look through `~/Library/Preferences/com.adobe.InDesign.plist` but nothing date-related in there.

Perhaps it is set by the local-domain `/Library/Preferences/*`. Perhaps it is set by the built-in preferences of your installed language version of Creative Suite. Perhaps…

So I gave up. I will leave the investigation for some day when I am younger and it is more important to understand how Adobe’s InDesign picks the format for the built-in date modified variable. The workaround works around.

Suite!

[indesign]: http://www.adobe.com/products/indesign.html
[variable]: http://help.adobe.com/en_US/indesign/cs/using/WS6A9BE096-77B2-4721-9736-797C4912B6C9a.html
[models]: http://www.youtube.com/watch?v=ZkuCPYf16xI

Free software FTW! Updated filetimes.py

[Two years ago][old] (flippin’ heck it seems like only yesterday) I wrote about converting between Unix timestamps and Windows timestamps using Python. In that post I linked to my very simple implementation of a module that provides converting back and forth between the formats.

A few weeks ago I received an e-mail from Timothy Williams with changes to the my module so that it preserves the fractions of a second in the conversion. How sweet is that?!?!! Exclamation mark question mark exclamation mark cellida diaresis em-dash full stop king of punctuation.

It is fantastic that not only did someone find my code useful but also that they were generous enough to take the time to improve it and give the changes back to me. I love tasty, delicious free software and the people like Tim who make it tastier and more delicious.

So here is the new version of [filetimes.py incorporating Tim’s fixes][ft].

[old]: http://reliablybroken.com/b/2009/09/working-with-active-directory-filetime-values-in-python/
[ft]: http://reliablybroken.com/b/wp-content/filetimes.py

XPath bug in old versions of ElementTree

I figured out why my XML parsing code works fine using the [pure-Python ElementTree XML parsing module][elementtree] but fails when using [the speedy and memory-optimized cElementTree XML parsing module][celementtree].

[The XPath 1.0 specification][xpath] says `’.’` is short-hand for `’self::node()’`, selecting a node itself.

Parsing an XML document and selecting the context node with ElementTree in Python 2.5:

>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
‘1.2.6’
>>> doc = “BUG
>>> node1 = ElementTree.fromstring(doc).find(‘./Example’)
>>> node1

>>> node1.find(‘.’)

>>> node1.find(‘.’) == node1
True

See how the result of `node1.find(‘.’)` is the node itself? [As it should be][selfnode].

Parsing an XML document and selecting the context node with cElementTree in Python 2.5:

>>> from xml.etree import cElementTree
>>> doc = “BUG
>>> node2 = cElementTree.fromstring(doc).find(‘./Example’)
>>> node2

>>> node2.find(‘.’)
>>> node2.find(‘.’) == node2
False

Balls. The result of `node2.find(‘.’)` is `None`.

However! I have a kludgey work-around that works whether you use ElementTree or cElementTree. Use `’./’` instead of `’.’`:

>>> node1.find(‘./’)

>>> node1.find(‘./’) == node1
True
>>> node2.find(‘./’)

>>> node2.find(‘./’) == node2
True

*Kludgey because `’./’` is not a valid XPath expression.*

So we are back on track. Also works for Python 2.6 which has the same version of ElementTree.

Fortunately Python 2.7 got a new version of ElementTree and the bug is fixed:

>>> from xml.etree import ElementTree
>>> ElementTree.VERSION
‘1.3.0’
>>> doc = “BUG
>>> node3 = ElementTree.fromstring(doc).find(‘./Example’)
>>> node3

>>> node3.find(‘.’)

>>> node3.find(‘.’) == node3
True

However! They also fixed my kludgey work-around:

>>> node3.find(‘./’)
>>> node3.find(‘./’) == node3
False

So I can’t code something that works for all three versions. This is annoying. I was hoping to just replace ElementTree with the C version, makes my code run in one third the time (the XML parts of it run in one tenth the time). And cannot install any compiled modules – the code can only rely on Python 2.5’s standard library.

[celementtree]: http://effbot.org/zone/celementtree.htm
[elementtree]: http://effbot.org/zone/element-index.htm
[xpath]: http://www.w3.org/TR/xpath/
[selfnode]: http://www.w3.org/TR/xpath/#path-abbrev

Lion: Spotlight still broken

I’ve installed [Mac OS X 10.7][lion] (upgrading from 10.6) and was very interested to see how the new search tokens feature would work in Spotlight. But on my Mac it doesn’t. Here’s the result of a search for files whose name contains the text “david buxton”:

Note how none of the files in that list has a name containing the text “david buxton”.

Perhaps deleting the Spotlight index would fix things. Meh.

[lion]: http://www.apple.com/pr/library/2011/06/06Mac-OS-X-Lion-With-250-New-Features-Available-in-July-From-Mac-App-Store.html