[TxMt] Python unicode error (was: r8839 (Python)) [reposting]
Alex Ross
ajross at cs.pdx.edu
Mon Jun 2 15:50:32 UTC 2008
On Jun 2, 2008, at 6:34 AM, Allan Odgaard wrote:
> On 2 Jun 2008, at 15:17, Alexey Blinov wrote:
>
>> Hmm... little test give me that:
>> […]
>> so... IMO print is better. Isn't it?
>
> Actually both versions are incomplete.
>
> To work with UTF-8 strings written to stdout in Python you need to:
>
> 1. Declare the source code to be UTF-8 (done with the encoding
> comment).
> 2. Declare the string itself to be a unicode string (done with the u-
> prefix).
> 3. Set the output stream to be UTF-8 (done by wrapping stdout in a
> codec-aware writer).
>
> If step 3 is omitted, the encoding of stdout will be taken from the
> environment, so often it will still work.
>
> The final script ends up being:
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import sys
> import codecs
>
> a = u"æble"
> sys.stdout = codecs.getwriter('utf-8')(sys.stdout);
> print a
This is the clearest, most concise description I've found of what the
heck a person needs to do to get Python Unicode working! Happily in
Python 3.0 ALL strings will be unicode and we'll be able to forget
about all of this (though likely we'll have to deal with a whole new
set of problems).
More information about the textmate
mailing list