Re: [TxMt] New "Unicode" bundle in the Review trunk

2 Jun 2008


      Hans-Jörg Bibiko wrote:
...
On 02.06.2008, at 00:04, Walter Dörwald wrote:
...
Here's another patch (against the current version). It shows both the 
codepoint and the name.
BTW, you don't have to use a regular expression to split a string into 
characters, simply iterating through it does the trick:
Index: Commands/Show Unicode Names.tmCommand
-for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(), 
"UTF-8")):

if (len(a)==1) and (a != '\n'):


     res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))


+for a in unicode(sys.stdin.read(), "UTF-8"):

if a != '\n':


     res = u"%s : U+%04X" % (a, ord(a))


     name = unicodedata.name(a, None)


     if name:


         res += u" : %s" % name
     print res.encode("UTF-8")</string>

<key>fallbackInput</key>
   <string>character</string>

Thanks! Just committed to the trunk.
...
...
...
Furthermore it would be great if this script could display all
information there is in the Python Unicode database, i.e. stuff like
unicodedata.category()
   unicodedata.bidrectional()
   unicodedata.decimal()
Yes. I have such a script in Perl which also shows up info about
Unicode
...
code points etc.
Just added to the bundle a prototype of 'Show Unicode Properties'
...
Another problem: Using Ctrl-Shift-U as the shortcut hides the "Convert 
To Lowercase" command.
Yes. This was a bad key combo. I changed it temporally to CTRL+OPT+APPLE+U
BTW: Can Python handle Unicode codepoints which are specified in Unicode 
pane B, meaning greater U+FFFF? I tried it out. I found out that Python 
uses UTF-16 internally.
At least the Python that ships with the OS uses 2 byte Unicode character 
with partial UTF-16 support:
Python 2.5.2 (r252:60911, Apr  8 2008, 18:54:00)
[GCC 3.3.5 (Debian 1:3.3.5-13)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
...
...
...
import sys
sys.maxunicode
65535
The size of a Unicode character is specified at compile time with the 
--enable-unicode option, so you *could* compile a wide Python with:
./configure --enable-unicode=ucs4
...
But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
I can print that character to TM but unicodedata fails because it 
expects one character but not two (?)
There are some spots in the Python code base where in narrow builds 
surrogate pairs are interpreted properly as characters outside the BMP, 
but unicodedata isn't one of them (so it's not actually real UTF-16 
throughout). There's an open issue on the Python bugtracker about that:
http://bugs.python.org/issue1706460
So there are two options:
1) Apple starts compiling its Python with --enable-unicode=ucs4
2) Python gets fixed so that surrogate pairs can be passed to 
unicodedata functions.
I think I might give 2) a try.
Servus,
    Walter

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [TxMt] New "Unicode" bundle in the Review trunk