Category Archives: Blog

Nginx and WordPress

My [Nginx][nginx] and [Wordpress][wp] configuration on [Debian Linux 5 (Lenny)][debian]. This has Nginx as the Web server using the fastcgi module to talk to php-cgi processes that run WordPress with pretty URLs.

The virtual private server I installed this on (from [John Companies][jc]) has a 256 megabyte slice, so I figured a regular Apache + mod_php setup might be in trouble seeing as one of the sites I am running gets several thousand visits a day. Up to now I have always used Apache with mod_php to run WordPress, and anyway it is fun to learn how unfamiliar software works ([Lotus Notes][notes] excepted).

On a side note, [SysV run-levels][sysv] and /etc/rcX.d directories are needlessly clever. [sysv-rc-conf][sysvrcconf] makes editing those easy.

server {
listen 80;
server_name example.com;

root /home/david/example.com;
index index.php index.html index.htm;
error_page 500 502 503 504 /50x.html;

location / {

}

location = /50x.html {
root /var/www/nginx-default;
}

# deny access to .htaccess files, if Apache’s document root
# concurs with nginx’s one
location ~ /\.ht {
deny all;
}

location /b/ {
if (!-e $request_filename) {
rewrite ^(.+)$ /b/index.php?q=$1 last;
}
}

location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
}
}

# Rewrite www.example.com to example.com
server {
listen 80;
server_name www.example.com;
rewrite ^ http://example.com$request_uri?;
}

This configuration is for WordPress installed under http://example.com/b/ on my server.

[nginx]: http://nginx.org/
[wp]: http://wordpress.org/
[jc]: http://www.johncompanies.com/
[notes]: http://www.ibm.com/software/lotus/products/notes/
[sysvrcconf]: http://sysv-rc-conf.sourceforge.net/
[sysv]: http://www.redhat.com/docs/manuals/linux/RHL-9-Manual/ref-guide/s1-boot-init-shutdown-sysv.html
[debian]: http://wiki.debian.org/DebianLenny

Split a file on any character in Python

I need to split a big text file on a certain character. I expect I am being thick about this, but [`split`][split] doesn’t quite do what I want because it includes the matching line, whereas I want to split right on the matching character.

My Python answer:

def readlines(filename, endings, chunksize=4096):
“””Returns a generator that splits on lines in a file with the given
line-ending.
“””
line = ”
while True:
buf = filename.read(chunksize)
if not buf:
yield line
break

line = line + buf

while endings in line:
idx = line.index(endings) + len(endings)
yield line[:idx]
line = line[idx:]

if __name__ == “__main__”:
import sys, os

FORMFEED = chr(12) # ASCII 12
basename = os.path.basename(sys.argv[1])
for num, data in enumerate(readlines(open(sys.argv[1]), endings=FORMFEED)):
filename = basename + ‘-‘ + str(num)
open(filename, ‘wb’).write(data)

This is also useful when reading data exported from some old-fashioned Mac application like [Filemaker 5][filemaker] where the line-endings are ASCII 13 not ASCII 10.

This post was inspired by [Lotus Notes][lotus] version 8.5, which is so advanced that to save a message in a file on disk you have to export it as structured text. And if you want to save a whole bunch of messages as individual files you must forget that [drag-and-drop was introduced with System 7][mactech], that would be too obvious.

[filemaker]: http://www.filemaker.com/support/downloads/downloads_prev_versions.html
[split]: http://developer.apple.com/Mac/library/documentation/Darwin/Reference/ManPages/man1/split.1.html
[lotus]: http://www-01.ibm.com/software/lotus/products/notes/
[mactech]: http://www.mactech.com/articles/mactech/Vol.10/10.06/DragAndDrop/index.html

Django AdminForm objects and templates

I can’t find documentation for the context of a Django admin template. In particular, where is the form and how does one access the fields? This post describes the template context for a generic admin model for [Django 1.1][django11].

Django uses an instance of `ModelAdmin` (defined in [`django.contrib.admin.options`][options]) to handle the request for a model object add / change view in the admin site. `ModelAdmin.add_view` and `ModelAdmin.change_view` are responsible for populating the template context when rendering the add object and change object pages respectively.

Here are the keys common to add and change views:

– **title**, ‘Add ‘ or ‘Change ‘ + your model class’ `_meta.verbose_name`
– **adminform** is an instance of `AdminForm`
– **is_popup**, a boolean which is true when `_popup` is passed as a request parameter
– **media** is an instance of [`django.forms.Media`][media]
– **inline_admin_formsets** is a list of [`InlineAdminFormSet`][inlineset] objects
– **errors** is an instance of [`AdminErrorList`][errors]
– **root_path** is the `root_path` attribute of the `AdminSite` object
– **app_label** is your model class’ `_meta.app_label` attribute

The way that Django renders a form in the admin view is to iterate over the `adminform` instance and then iterate over each [`FieldSet`][fieldset] which in turn yield [`AdminField`][adminfield] instances. All I want to do is layout the form fields, ignoring the fieldset groupings which may or may not be defined in the model’s `ModelAdmin.fieldset` attribute.

This turns out to be easy once you know how. The regular form is an attribute of the `adminform` object. So if your model has a field named “`king_of_pop`” you can refer to the form field in your template like so:

{{ adminform.form.king_of_pop.label_tag }}: {{ adminform.form.king_of_pop }}

Or if you want to save your finger tips you can use the [`with` template tag][with]:

{% with adminform.form as f %}
{{ f.king_of_pop.label_tag }}: {{ f.king_of_pop }}
{% endwith %}

Delving through the Django source while I tried to understand all of this I was struck by how [Python defines hook functions for iteration and accessing attributes][hooks]. Half of Python’s attraction is in how easy it is from the program author’s point of view to treat objects as built-in types like lists, dicts, etc.; the other half is the responsibility of the author of a Python module to encourage that same ease of use by implementing the related iteration protocols. It is harder to write a good Python module than it is to write a good Python program that uses a good module.

[django11]: http://code.djangoproject.com/browser/django/tags/releases/1.1
[options]: http://code.djangoproject.com/browser/django/tags/releases/1.1/django/contrib/admin/options.py#L175
[fieldset]: http://code.djangoproject.com/browser/django/tags/releases/1.1/django/contrib/admin/helpers.py#L50
[adminfield]: http://code.djangoproject.com/browser/django/tags/releases/1.1/django/contrib/admin/helpers.py#L82
[with]: http://docs.djangoproject.com/en/dev/ref/templates/builtins/#with
[media]: http://docs.djangoproject.com/en/dev/topics/forms/media/
[inlineset]: http://code.djangoproject.com/browser/django/tags/releases/1.1/django/contrib/admin/helpers.py#L102
[errors]: http://code.djangoproject.com/browser/django/tags/releases/1.1/django/contrib/admin/helpers.py#L198
[hooks]: http://docs.python.org/reference/datamodel.html#emulating-container-types

Using MacPorts behind a firewall

I failed to persuade [MySQLdb][mysqldb] to build on a [Mac OS X Server 10.5.8][1058] install using the system [Python][python] + [MySQL][mysql] installation. So I turned to [MacPorts][macports] where I know I can get [Django][django] + all the bits working without much hassle (but with much patience).

The next problem was that MacPorts couldn’t update because [rsync][rsync] was blocked by the corporate access policy. Fortunately plain HTTP is permitted outbound. Here’s how to use a local ports tree.

Install MacPorts using the disk image for 10.5.

curl -O http://distfiles.macports.org/MacPorts/MacPorts-1.8.2-10.5-Leopard.dmg
hdiutil attach MacPorts-1.8.2-10.5-Leopard.dmg
sudo installer -pkg /Volumes/MacPorts-1.8.2/MacPorts-1.8.2.pkg -target /
hdiutil detach /Volumes/MacPorts-1.8.2

If the MacPorts install directories are not in your $PATH environment, you can add them to your `.profile`. This change will not take effect until you start a new terminal session.

*(Updated to keep variables as-is as suggested by commenter Bruce).*

cat >> ~/.profile <<\EOF PATH=/opt/local/bin:/opt/local/sbin:${PATH} MANPATH=/opt/local/share/man:${MANPATH} EOF After you have installed MacPorts, create a directory for the ports tree and check it out using [Subversion][svn]. sudo mkdir -p /opt/local/var/macports/sources/svn.macports.org/trunk/dports cd /opt/local/var/macports/sources/svn.macports.org/trunk/dports sudo svn co http://svn.macports.org/repository/macports/trunk/dports/ . N.B. In the last line beginning `svn co ...` the trailing directory separator is significant! Now tell MacPorts to use the local checkout rather than rsync. Edit `/opt/local/etc/macports/sources.conf` and add a new line to the end with the path to the ports tree, then comment out the previous line that uses rsync. Here are the last lines from my configuration: #rsync://rsync.macports.org/release/ports/ [default] file:///opt/local/var/macports/sources/svn.macports.org/trunk/dports/ [default] Finally you must create an index for the tree (otherwise you will see messages saying "Warning: No index(es) found!"). cd /opt/local/var/macports/sources/svn.macports.org/trunk/dports sudo portindex Now go do great things. [mysqldb]: http://mysql-python.sourceforge.net/MySQLdb.html [macports]: http://www.macports.org/ [1058]: http://www.apple.com/server/macosx/ [mysql]: http://www.mysql.com/ [python]: http://www.python.org/ [django]: http://www.djangoproject.com/ [rsync]: http://samba.anu.edu.au/rsync/ [svn]: http://subversion.tigris.org/

Confused of Wapping

Trying to get my head straight about what it means to modify a file and what it means to modify a folder for a desktop operating system. Mac OS X’s behaviour feels intuitively wrong, but turns out it is much harder than I expected to nail down exactly why it is wrong.

In general the implementation of filesystem metadata on OS X has been two steps back with respect to the old Mac OS ways. (The one step forward has been the rich metadata provided by the Finder in terms of previews, performance on folders with hundreds of files, and the exposing of file content metadata with the interface for finding files.)

I feel that Mac OS X is shifting away from the concept of files existing on a filesystem without providing a suitable alternative.

watchedinstall is useful

Very satisfying to use [watchedinstall][wi] at work the other day to see exactly what a tricksy meta-package was doing during installation. Now that I [fixed a stupid bug involving dtrace][bug], watchedinstall works a treat for recording exactly what goes where.

Many thanks to [Preston Holmes][ptone] for releasing watchedinstall in the first place.

My goal is to replace the functionality of the fsevents helper application with a [dtrace][dtrace] script that can list filesystem changes. A single python script would be simpler to install and use – you wouldn’t need to install it at all, just run it from the directory you downloaded it to. No effing about with setting PATH environment variables, no worry about compiling a C program for whatever architecture.

Hey Esther!

[wi]: http://bitbucket.org/davidbuxton/watchedinstall/
[bug]: http://bitbucket.org/davidbuxton/watchedinstall/changeset/d97aaae628c3/
[ptone]: http://www.ptone.com/
[dtrace]: http://www.sun.com/bigadmin/content/dtrace/

AJAX-ified result paging is not good

An annoying trend in Web design: using [AJAX][ajax] to load results when there is more than one page.

Apple does this for their [search results][airport]. Netgear does this when [searching their knowledge base][netgear]. Microsoft does this for their [Mactopia discussion forums][microsoft]. All three ostensibly good, clean designs fail to consider what the hell a visitor wants in the first place, which is to see the next damn page of results.

The first problem with using AJAX to load results is that the browser view does not change when the new results are loaded. Suppose you have read the first ten results, you scroll to the last result on the page and the first result scrolls up and out of view. Then you click the link for the next page of results. The fancy AJAX loader replaces the existing list of results with the next page’s list of results, but does not move the view, leaving you staring at the last result on the second page when what you want is to see the first result of the second page, so you have to scroll back to the top of the page.

The script to load the results should scroll the view so the first result of the subsequent page is visible – I have yet to see an example of this behaviour.

The second problem is the URL does not change between one page and the next, which means you cannot bookmark any page other than the first. URLs and hyperlinks are the very stuff of the Web, it is mad not to make use of them.

My guess is that the Web designer in each of these cases was so pleased by the effect of updating the visitor’s view of a page without changing the browser location that she figured it was an improvement over the established technique of passing query parameters in a URL.

It is not. Please go back to the old-fashioned use of a query parameter to indicate the offset into a list of results.

[netgear]: http://kb.netgear.com/app/answers/list/
[airport]: http://support.apple.com/downloads/#airport
[microsoft]: http://www.officeformac.com/ProductForums/Entourage/
[ajax]: http://en.wikipedia.org/wiki/Ajax_(programming)

ModelForms good for importing too

If you have exported data from one database in plain text format and you want to import it to [Django][django], you should use a [`ModelForm` class][modelform] to do a lot of the heavy lifting for you.

A suitable `ModelForm` for your Django model will consume each row and do the conversion of each field to an appropriate Python type. Much simpler than explicitly converting each value yourself before creating a new model instance.

Suppose you have a model for an address book entry and its associated `ModelForm` (this works for Django 1.1):

# myapp/models.py
from django.db import models
from django import forms

class Contact(models.Model):
first_name = models.CharField(max_length=100)
second_name = models.CharField(max_length=100)
telephone = models.CharField(max_length=50, blank=True)
email = models.EmailField(blank=True)

class ContactForm(forms.ModelForm):
class Meta:
model = Contact

Here’s a script to run through a comma-separated list of contacts where each line looks something like “Smits, Jimmy, [email protected], 555-1234”:

from myapp.models import ContactForm

# Map columns to fields, adjusting the order as necessary
column_map = (
‘second_name’,
‘first_name’,
’email’,
‘telephone’,
)

for line in open(‘tab-separated-data.txt’):
row = dict(zip(column_map, (field.strip() for field in line.split(‘,’))))
form_obj = ContactForm(row)
try:
form_obj.save()
except ValueError:
for k, v in form_obj.errors.items():
print k, row[k], ‘, ‘.join(map(unicode, v))

If a line doesn’t validate the script prints the validation errors and moves to the next line. If your data has columns you want to ignore then just name them in the `column_map` – the form class will ignore extra keys in the dictionary.

[django]: http://www.djangoproject.com
[modelform]: http://docs.djangoproject.com/en/dev/topics/forms/modelforms/

Removing printing restrictions in 10.5

I am trying to write a good one-liner for removing all restrictions on printing for Mac OS X 10.5. I had thought that [`sed`][sed] would be perfect for this, but I can’t arrive at a simple syntax for appending new lines that works well when pasted into a terminal window. Here’s what I ended up with:

perl -p -0 -i ‘.bak’ -e ‘s/(Policy default).*(Policy)/$1>\n\nOrder deny,allow\nAllow from all\n<\/Limit>\n<$2/s' /private/etc/cups/cupsd.conf Rather brutal, it just guts the default policy and replaces it with the following:

Order deny,allow
Allow from all

Greg Neagle has [a useful article about printing in the enterprise][mactech]. Apple suggests [adding the network group to the local lpadmin group][ht3511], but points out that mobile users would need to be added individually. In my case most accounts are mobile accounts and we trust everyone to manage print queues on a Mac, so removing all restrictions is acceptable.

[mactech]: http://www.mactech.com/articles/mactech/Vol.24/24.06/2406MacEnterprise-LeopardPrinting/index.html
[ht3511]: http://support.apple.com/kb/HT3511
[sed]: http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPages/man1/sed.1.html

Notes on Radmind’s checksum

It would be nice to do a [pure-Python][python] implementation of [Radmind][radmind]’s fsdiff output for [watchedinstall][watchedinstall], which consists of several white-space separated fields describing the filename’s attributes and an optional checksum for the file.

These are notes on how Radmind generates checksums for files on [Mac OS X][macosx].

The [fsdiff format is documented][manfsdiff], however for files with Mac Finder info or a resource fork the checksum is for an [AppleSingle][applesingle]-encoded representation of the file, which means a Python implementation needs to produce an equivalent AppleSingle-encoded byte stream for the file. Bummer.

Python 2.6 on Mac OS X includes a [(deprecated) applesingle module][applesinglemod] that can read the format but cannot write it (and the module has been removed for Python 3). Therefore a pure Python implementation of Radmind’s checksum has to implement a compatible AppleSingle encoding routine too.

Radmind’s fsdiff command is written in C, which I can just about get the gist of, but I am missing something because my attempts at emulating Radmind’s checksums are wrong.

The meat of Radmind’s checksum is the [`do_acksum()` function in `cksum.c`][cksum]. The algorithm appears to be as follows:

1. Initialize a digest using the default cipher ([MD5][md5] I think).
2. Add the AppleSingle header, consisting of a magic number and version number and some padding.
3. Add the AppleSingle entry table, which has 3 entries for the Finder info, the resource fork info and the data fork info (in that order). Each entry is 12 bytes – an unsigned long for the entry type, an unsigned long for an offset into the file where the data will start and an unsigned long for the data length.
4. Add the Finder info data.
5. Add the resource for data.
6. Add the data fork data.
7. Return a base64 encoded version of the final digest.

Because the entry table in the AppleSingle header specifies data offsets and lengths you need to know the size of the Finder info data (always 32 bytes) and the size of the resource fork and the size of the data fork before you pass that data to the digest function.

So a working Python implementation needs to know the size of the resource fork and data fork before feeding that same data to the digest. It seems to me that this requirement might imply huge memory allocations while slurping file data – my wrong attempt tried counting bytes and later feeding the same data to the digest in manageable chunks.

Anyway…

Advice much appreciated. The workaround is to leave it to fsdiff to generate the checksum and parse the value from the output.

David

P.S. I still intend running [A/UX 3.0.1][aux] on my Centris 660av one day.

Update: using my eyes and brains and the `fsdiff -V` command I was able to read the fsdiff man page and deduce the preferred checksum cipher is actually sha1. My code is still wrong.

[radmind]: http://rsug.itd.umich.edu/software/radmind/
[python]: http://www.python.org/
[macosx]: http://www.apple.com/macosx/
[manfsdiff]: http://linux.die.net/man/1/fsdiff
[applesingle]: http://users.phg-online.de/tk/netatalk/doc/Apple/v2/AppleSingle_AppleDouble.pdf
[applesinglemod]: http://www.python.org/doc/2.6.2/library/undoc.html#module-applesingle
[md5]: http://www.openssl.org/docs/crypto/md5.html
[aux]: http://www.aux-penelope.com/
[cksum]: http://radmind.cvs.sourceforge.net/viewvc/radmind/radmind/cksum.c
[watchedinstall]: http://bitbucket.org/ptone/watchedinstall/