[TxMt] Python unicode error (was: r8839 (Python)) [reposting]

Alex Ross ajross at cs.pdx.edu
Mon Jun 2 15:50:32 UTC 2008


On Jun 2, 2008, at 6:34 AM, Allan Odgaard wrote:

> On 2 Jun 2008, at 15:17, Alexey Blinov wrote:
>
>> Hmm... little test give me that:
>> […]
>> so... IMO print is better. Isn't it?
>
> Actually both versions are incomplete.
>
> To work with UTF-8 strings written to stdout in Python you need to:
>
> 1. Declare the source code to be UTF-8 (done with the encoding  
> comment).
> 2. Declare the string itself to be a unicode string (done with the u- 
> prefix).
> 3. Set the output stream to be UTF-8 (done by wrapping stdout in a  
> codec-aware writer).
>
> If step 3 is omitted, the encoding of stdout will be taken from the  
> environment, so often it will still work.
>
> The final script ends up being:
>
>    #!/usr/bin/env python
>    # -*- coding: utf-8 -*-
>
>    import sys
>    import codecs
>
>    a = u"æble"
>    sys.stdout = codecs.getwriter('utf-8')(sys.stdout);
>    print a

This is the clearest, most concise description I've found of what the  
heck a person needs to do to get Python Unicode working!  Happily in  
Python 3.0 ALL strings will be unicode and we'll be able to forget  
about all of this (though likely we'll have to deal with a whole new  
set of problems).


More information about the textmate mailing list