On Jun 2, 2008, at 6:34 AM, Allan Odgaard wrote:
On 2 Jun 2008, at 15:17, Alexey Blinov wrote:
Hmm... little test give me that: […] so... IMO print is better. Isn't it?
Actually both versions are incomplete.
To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding
comment). 2. Declare the string itself to be a unicode string (done with the u- prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work.
The final script ends up being:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
a = u"æble" sys.stdout = codecs.getwriter('utf-8')(sys.stdout); print a
This is the clearest, most concise description I've found of what the heck a person needs to do to get Python Unicode working! Happily in Python 3.0 ALL strings will be unicode and we'll be able to forget about all of this (though likely we'll have to deal with a whole new set of problems).