On 2 Jun 2008, at 15:40, Allan Odgaard wrote: To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding
comment). 2. Declare the string itself to be a unicode string (done with the u- prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work.
The final script ends up being:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
a = u"æble" sys.stdout = codecs.getwriter('utf-8')(sys.stdout); print a
Only for clarification: If I write a new python script my head should be à la:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout) sys.stdin = codecs.getreader('utf-8')(sys.stdin) ....
and then I do not need unicode(foo, 'UTF-8') and foo.encode('UTF-8') (?)
Thanks, --Hans