[TxMt] Python unicode error (was: r8839 (Python)) [reposting]

Walter Dörwald walter at livinglogic.de
Mon Jun 2 15:11:42 UTC 2008


Hans-Joerg Bibiko wrote:
> 
>>
>> On 2 Jun 2008, at 15:40, Allan Odgaard wrote:
>> To work with UTF-8 strings written to stdout in Python you need to:
>>
>> 1. Declare the source code to be UTF-8 (done with the encoding comment).
>> 2. Declare the string itself to be a unicode string (done with the 
>> u-prefix).
>> 3. Set the output stream to be UTF-8 (done by wrapping stdout in a 
>> codec-aware writer).
>>
>> If step 3 is omitted, the encoding of stdout will be taken from the 
>> environment, so often it will still work.
>>
>> The final script ends up being:
>>
>>    #!/usr/bin/env python
>>    # -*- coding: utf-8 -*-
>>
>>    import sys
>>    import codecs
>>
>>    a = u"æble"
>>    sys.stdout = codecs.getwriter('utf-8')(sys.stdout);
>>    print a
> 
> Only for clarification:
> If I write a new python script my head should be à la:
> 
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> 
> import sys
> import codecs
> 
> sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
> sys.stdin  = codecs.getreader('utf-8')(sys.stdin)
> ....
> 
> and then I do not need unicode(foo, 'UTF-8') and foo.encode('UTF-8') (?)

Exactly: sys.stdin.read() will return unicode strings and 
sys.stdout.write() will accept unicode strings.

Servus,
    Walter



More information about the textmate mailing list