Tag Archives: unicode

os.walk and UnicodeDecodeError

My Python program was raising a `UnicodeDecodeError` when using [`os.walk`][oswalk] to traverse a directory containing files and folders with UTF-8 encoded names. What had me baffled was the exact same program was working perfectly on the exact same hardware just minutes earlier.

Turns out the difference was between me starting my program as root from an interactive bash shell, versus the program getting started as part of the boot sequence by init (on a [Debian Lenny][lenny] system). When started interactively, the locale was set to en_GB.UTF-8 and so names on the filesystem were assumed to be UTF-8 encoded. When started by init, the locale was set to ASCII.

The fix, as described in this article [*Python: how is sys.stdout.encoding chosen?*][codemonk], was to wrap my program in a script that set the LC_CTYPE environment variable.

#!/bin/sh
export LC_CTYPE=’en_GB.UTF-8′
/path/to/program.py

[codemonk]: http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/
[lenny]: http://www.debian.org/
[oswalk]: http://docs.python.org/library/os.html#os.walk