It would be nice to do a [pure-Python][python] implementation of [Radmind][radmind]’s fsdiff output for [watchedinstall][watchedinstall], which consists of several white-space separated fields describing the filename’s attributes and an optional checksum for the file.
These are notes on how Radmind generates checksums for files on [Mac OS X][macosx].
The [fsdiff format is documented][manfsdiff], however for files with Mac Finder info or a resource fork the checksum is for an [AppleSingle][applesingle]-encoded representation of the file, which means a Python implementation needs to produce an equivalent AppleSingle-encoded byte stream for the file. Bummer.
Python 2.6 on Mac OS X includes a [(deprecated) applesingle module][applesinglemod] that can read the format but cannot write it (and the module has been removed for Python 3). Therefore a pure Python implementation of Radmind’s checksum has to implement a compatible AppleSingle encoding routine too.
Radmind’s fsdiff command is written in C, which I can just about get the gist of, but I am missing something because my attempts at emulating Radmind’s checksums are wrong.
The meat of Radmind’s checksum is the [`do_acksum()` function in `cksum.c`][cksum]. The algorithm appears to be as follows:
1. Initialize a digest using the default cipher ([MD5][md5] I think).
2. Add the AppleSingle header, consisting of a magic number and version number and some padding.
3. Add the AppleSingle entry table, which has 3 entries for the Finder info, the resource fork info and the data fork info (in that order). Each entry is 12 bytes – an unsigned long for the entry type, an unsigned long for an offset into the file where the data will start and an unsigned long for the data length.
4. Add the Finder info data.
5. Add the resource for data.
6. Add the data fork data.
7. Return a base64 encoded version of the final digest.
Because the entry table in the AppleSingle header specifies data offsets and lengths you need to know the size of the Finder info data (always 32 bytes) and the size of the resource fork and the size of the data fork before you pass that data to the digest function.
So a working Python implementation needs to know the size of the resource fork and data fork before feeding that same data to the digest. It seems to me that this requirement might imply huge memory allocations while slurping file data – my wrong attempt tried counting bytes and later feeding the same data to the digest in manageable chunks.
Advice much appreciated. The workaround is to leave it to fsdiff to generate the checksum and parse the value from the output.
P.S. I still intend running [A/UX 3.0.1][aux] on my Centris 660av one day.
Update: using my eyes and brains and the `fsdiff -V` command I was able to read the fsdiff man page and deduce the preferred checksum cipher is actually sha1. My code is still wrong.