Skip to content
os.walk and UnicodeDecodeError

os.walk and UnicodeDecodeError

June 9, 2009

My Python program was raising a UnicodeDecodeError when using os.walk to traverse a directory containing files and folders with UTF-8 encoded names. What had me baffled was the exact same program was working perfectly on the exact same hardware just minutes earlier.

Turns out the difference was between me starting my program as root from an interactive bash shell, versus the program getting started as part of the boot sequence by init (on a Debian Lenny system). When started interactively, the locale was set to en_GB.UTF-8 and so names on the filesystem were assumed to be UTF-8 encoded. When started by init, the locale was set to ASCII.

The fix, as described in this article Python: how is sys.stdout.encoding chosen?, was to wrap my program in a script that set the LC_CTYPE environment variable.

#!/bin/sh
export LC_CTYPE='en_GB.UTF-8'
/path/to/program.py
Last updated on