[TxMt] Python unicode error (was: r8839 (Python)) [reposting]

Allan Odgaard throw-away-2 at macromates.com
Mon Jun 2 13:34:07 UTC 2008

On 2 Jun 2008, at 15:17, Alexey Blinov wrote:

> Hmm... little test give me that:
> […]
> so... IMO print is better. Isn't it?

Actually both versions are incomplete.

To work with UTF-8 strings written to stdout in Python you need to:

  1. Declare the source code to be UTF-8 (done with the encoding  
  2. Declare the string itself to be a unicode string (done with the u- 
  3. Set the output stream to be UTF-8 (done by wrapping stdout in a  
codec-aware writer).

If step 3 is omitted, the encoding of stdout will be taken from the  
environment, so often it will still work.

The final script ends up being:

     #!/usr/bin/env python
     # -*- coding: utf-8 -*-

     import sys
     import codecs

     a = u"æble"
     sys.stdout = codecs.getwriter('utf-8')(sys.stdout);
     print a

