Hi,
I found a small bug. Since this is part of a bundle, the macromates website asked me to report it here.
Summary: sort does not with euro symbol, and some other non-Latin characters.
Steps to reproduce 1. Open new text file. 2. Add text with a EURO symbol, e.g. aaa € bbb ¥ ccc $ ddd £ 3. sort the file with F5 (Text > Sorting > Sort lines in document)
I get the error: sort: string comparison failed: Illegal byte sequence sort: Set LC_ALL='C' to work around the problem. sort: The strings compared were `AAA \302\202\254' and `BBB ¥'.
Most non-Latin characters work fine. Just this one fails. If I save the file and simply run "sort test.txt" all is fine. Both in my shell and in TextMate, "echo $LC_ALL" return "en_GB.UTF-8".
Typing the following script in TextMate and "run script" works fine: echo "ccc\naaa€\nbbb" | sort
What would be different for the "Sort lines in document" (which simply calls "sort") and the above script?
Any clue? Is this reproducible by others?
Regards, Freek Dijkstra
On 8/21/09 7:58 AM, in article 4A8EB617.3030900@macfreek.nl, "Freek Dijkstra" public@macfreek.nl wrote:
Steps to reproduce
- Open new text file.
- Add text with a EURO symbol, e.g.
aaa € bbb ¥ ccc $ ddd £ 3. sort the file with F5 (Text > Sorting > Sort lines in document)
I get the error: sort: string comparison failed: Illegal byte sequence sort: Set LC_ALL='C' to work around the problem. sort: The strings compared were `AAA \302\202\254' and `BBB ¥'.
Most non-Latin characters work fine. Just this one fails. If I save the file and simply run "sort test.txt" all is fine. Both in my shell and in TextMate, "echo $LC_ALL" return "en_GB.UTF-8".
I tried your "steps to reproduce" and couldn't reproduce. :) The text sorted fine for me. I changed "aaa" to "fff" to make sure it really *was* sorting and it was.
On my machine, $LC_ALL has no value. However, I notice that in my environment (type "set" at the command line), LANG=en_US.UTF-8, and in my TM shell environment, LC_CTYPE=en_US.UTF-8. Perhaps it is one of these that you need to set.
Also make sure that in TM's Advanced pref pane the default file encoding is set to UTF-8...?
Just an idea. Sorry if it doesn't help. m.
On 21 Aug 2009, at 16:58, Freek Dijkstra wrote:
I found a small bug. Since this is part of a bundle, the macromates website asked me to report it here.
Summary: sort does not with euro symbol, and some other non-Latin characters. […]
This is a limitation of the locale files included with Leopard.
Snow Leopard has more complete locale files, so here it works as expected.
What would be different for the "Sort lines in document" (which simply calls "sort") and the above script?
We call sort with ‘-f’ to “fold case” (so A < b < C etc.), this is why sort need to reference the locale info, which is not required if you don’t do any case folding.
I am not sure if we need to address this, given that the problem is fixed with Snow Leopard and the input causing the problem should be rare, plus the only fix I can think of would be to test if sort fails and if so, call it again without case-folding. hmm… I probably will end up spending a few minutes seeing if I can do a quick fallback fix :)