Hi,
sorry for breaking the rules! I lost the thread.
On 6 Feb 2008, at 22:26, Alexander John Ross wrote:
• Add Unicode support to PyMate / ScriptMate.
Changed: U trunk/Bundles/Python.tmbundle/Support/PyMate/pymate.rb U trunk/Bundles/Python.tmbundle/Support/PyMate/tmhooks.py U trunk/Support/lib/scriptmate.rb
I got this ticket http://macromates.com/ticket/show? ticket_id=502C2FDD and after some experiments I think the problem is that PyMate does not pick up on the encoding provided by the user.
For example a script like this will error:
# coding: utf-8 print("æble")
I do not know whether this helps to solve the problem, but I just figured out that 'print' is the problem. If I'm using 'sys.__stdout__.write' instead it works.
#!/usr/bin/env python # encoding: utf-8
import sys import os
a = u"æble" sys.__stdout__.write( a.encode("raw_unicode_escape") )
Cheers,
--Hans
Hmm... little test give me that: ------------------------------------ nilcolor$ cat test.py #!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import os
a = u"æble" sys.__stdout__.write( a.encode("raw_unicode_escape") + "\n" ) print a nilcolor$ python test.py ?ble æble nilcolor$ ------------------------------------- so... IMO print is better. Isn't it?
Alexey Blinov
On Mon, Jun 2, 2008 at 2:24 AM, Hans-Jörg Bibiko bibiko@eva.mpg.de wrote:
Hi,
sorry for breaking the rules! I lost the thread.
On 6 Feb 2008, at 22:26, Alexander John Ross wrote:
• Add Unicode support to PyMate / ScriptMate.
Changed: U trunk/Bundles/Python.tmbundle/Support/PyMate/pymate.rb U trunk/Bundles/Python.tmbundle/Support/PyMate/tmhooks.py U trunk/Support/lib/scriptmate.rb
I got this ticket http://macromates.com/ticket/show?ticket_id=502C2FDD and after some experiments I think the problem is that PyMate does not pick up on the encoding provided by the user.
For example a script like this will error:
# coding: utf-8 print("æble")
I do not know whether this helps to solve the problem, but I just figured out that 'print' is the problem. If I'm using 'sys.__stdout__.write' instead it works.
#!/usr/bin/env python # encoding: utf-8
import sys import os
a = u"æble" sys.__stdout__.write( a.encode("raw_unicode_escape") )
Cheers,
--Hans ______________________________________________________________________ For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
On 2 Jun 2008, at 15:17, Alexey Blinov wrote:
Hmm... little test give me that:
nilcolor$ cat test.py #!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import os
a = u"æble" sys.__stdout__.write( a.encode("raw_unicode_escape") + "\n" ) print a nilcolor$ python test.py ?ble æble nilcolor$
so... IMO print is better. Isn't it?
I forgot to mention that I used PyMate for it. If I open that python in TM and press APPLE+R I got this:
______ PyMate r8839 running Python 2.4.2 (/usr/bin/env python)
untitled
æble æble Program exited. _______
If you want to use that script from the command line, please try this one:
_______ #!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import os
a = u"æble" sys.__stdout__.write( a.encode("raw_unicode_escape") + "\n" ) print a.encode("raw_unicode_escape")
________
python foo.py > out.txt mate out.txt
and you will see in both cases that it works.
Cheers,
--Hans
On 2 Jun 2008, at 15:17, Alexey Blinov wrote:
Hmm... little test give me that: […] so... IMO print is better. Isn't it?
Actually both versions are incomplete.
To work with UTF-8 strings written to stdout in Python you need to:
1. Declare the source code to be UTF-8 (done with the encoding comment). 2. Declare the string itself to be a unicode string (done with the u- prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work.
The final script ends up being:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
a = u"æble" sys.stdout = codecs.getwriter('utf-8')(sys.stdout); print a
On 2 Jun 2008, at 15:34, Allan Odgaard wrote:
[...] To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding
comment). 2. Declare the string itself to be a unicode string (done with the u- prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work. [...]
I should have added that this is actually desired, i.e. to have Python convert the output to whatever encoding the environment is using.
Most users though does not have LC_CTYPE properly setup (but they should fix that!).
In Leopard Terminal will set LANG automatically. Unfortunately sometimes it is set to an invalid value (rdar://5564288) and IMO they should only set LC_CTYPE.
On 2 Jun 2008, at 15:40, Allan Odgaard wrote:
On 2 Jun 2008, at 15:34, Allan Odgaard wrote:
[...] To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding
comment). 2. Declare the string itself to be a unicode string (done with the u-prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work. [...]
I should have added that this is actually desired, i.e. to have Python convert the output to whatever encoding the environment is using.
Most users though does not have LC_CTYPE properly setup (but they should fix that!).
In Leopard Terminal will set LANG automatically. Unfortunately sometimes it is set to an invalid value (rdar://5564288) and IMO they should only set LC_CTYPE.
The process of learning will never become to an end (fortunately) ;) Thanks a lot!!
--Hans
On 2 Jun 2008, at 15:40, Allan Odgaard wrote: To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding
comment). 2. Declare the string itself to be a unicode string (done with the u- prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work.
The final script ends up being:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
a = u"æble" sys.stdout = codecs.getwriter('utf-8')(sys.stdout); print a
Only for clarification: If I write a new python script my head should be à la:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout) sys.stdin = codecs.getreader('utf-8')(sys.stdin) ....
and then I do not need unicode(foo, 'UTF-8') and foo.encode('UTF-8') (?)
Thanks, --Hans
Hans-Joerg Bibiko wrote:
On 2 Jun 2008, at 15:40, Allan Odgaard wrote: To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding comment).
- Declare the string itself to be a unicode string (done with the
u-prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work.
The final script ends up being:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
a = u"æble" sys.stdout = codecs.getwriter('utf-8')(sys.stdout); print a
Only for clarification: If I write a new python script my head should be à la:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout) sys.stdin = codecs.getreader('utf-8')(sys.stdin) ....
and then I do not need unicode(foo, 'UTF-8') and foo.encode('UTF-8') (?)
Exactly: sys.stdin.read() will return unicode strings and sys.stdout.write() will accept unicode strings.
Servus, Walter
Only for clarification: If I write a new python script my head should be à la:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout) sys.stdin = codecs.getreader('utf-8')(sys.stdin) ....
and then I do not need unicode(foo, 'UTF-8') and foo.encode('UTF-8') (?)
Thanks, --Hans
Clear and good description! Thanks Hans! Its worth to be stored in my DevonThink db *)
Alexey
On Jun 2, 2008, at 6:34 AM, Allan Odgaard wrote:
On 2 Jun 2008, at 15:17, Alexey Blinov wrote:
Hmm... little test give me that: […] so... IMO print is better. Isn't it?
Actually both versions are incomplete.
To work with UTF-8 strings written to stdout in Python you need to:
- Declare the source code to be UTF-8 (done with the encoding
comment). 2. Declare the string itself to be a unicode string (done with the u- prefix). 3. Set the output stream to be UTF-8 (done by wrapping stdout in a codec-aware writer).
If step 3 is omitted, the encoding of stdout will be taken from the environment, so often it will still work.
The final script ends up being:
#!/usr/bin/env python # -*- coding: utf-8 -*-
import sys import codecs
a = u"æble" sys.stdout = codecs.getwriter('utf-8')(sys.stdout); print a
This is the clearest, most concise description I've found of what the heck a person needs to do to get Python Unicode working! Happily in Python 3.0 ALL strings will be unicode and we'll be able to forget about all of this (though likely we'll have to deal with a whole new set of problems).