On 02.06.2008, at 00:04, Walter Dörwald wrote:
Here's another patch (against the current version). It shows both the codepoint and the name.
BTW, you don't have to use a regular expression to split a string into characters, simply iterating through it does the trick:
Index: Commands/Show Unicode Names.tmCommand -for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(), "UTF-8")):
if (len(a)==1) and (a != '\n'):
res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
+for a in unicode(sys.stdin.read(), "UTF-8"):
if a != '\n':
res = u"%s : U+%04X" % (a, ord(a))
name = unicodedata.name(a, None)
if name:
<key>fallbackInput</key> <string>character</string>res += u" : %s" % name print res.encode("UTF-8")</string>
Thanks! Just committed to the trunk.
Furthermore it would be great if this script could display all information there is in the Python Unicode database, i.e. stuff
like
unicodedata.category() unicodedata.bidrectional() unicodedata.decimal()
Yes. I have such a script in Perl which also shows up info about
Unicode
code points etc.
Just added to the bundle a prototype of 'Show Unicode Properties'
Another problem: Using Ctrl-Shift-U as the shortcut hides the "Convert To Lowercase" command.
Yes. This was a bad key combo. I changed it temporally to CTRL+OPT +APPLE+U
BTW: Can Python handle Unicode codepoints which are specified in Unicode pane B, meaning greater U+FFFF? I tried it out. I found out that Python uses UTF-16 internally. But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 . I can print that character to TM but unicodedata fails because it expects one character but not two (?)
Servus,
--der Hans