Re: [TxMt] New "Unicode" bundle in the Review trunk

2 Jun 2008


      On 02.06.2008, at 00:04, Walter Dörwald wrote:
...
Here's another patch (against the current version). It shows both  
the codepoint and the name.
BTW, you don't have to use a regular expression to split a string  
into characters, simply iterating through it does the trick:
Index: Commands/Show Unicode Names.tmCommand
-for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(),  
"UTF-8")):

if (len(a)==1) and (a != '\n'):


     res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))


+for a in unicode(sys.stdin.read(), "UTF-8"):

if a != '\n':


     res = u"%s : U+%04X" % (a, ord(a))


     name = unicodedata.name(a, None)


     if name:


         res += u" : %s" % name
     print res.encode("UTF-8")</string>

<key>fallbackInput</key>
 <string>character</string>

Thanks! Just committed to the trunk.
...
...
...
Furthermore it would be great if this script could display all
information there is in the Python Unicode database, i.e. stuff
like
...
...
unicodedata.category()
   unicodedata.bidrectional()
   unicodedata.decimal()
Yes. I have such a script in Perl which also shows up info about
Unicode
...
code points etc.
Just added to the bundle a prototype of 'Show Unicode Properties'
...
Another problem: Using Ctrl-Shift-U as the shortcut hides the  
"Convert To Lowercase" command.
Yes. This was a bad key combo. I changed it temporally to CTRL+OPT 
+APPLE+U
BTW: Can Python handle Unicode codepoints which are specified in  
Unicode pane B, meaning greater U+FFFF? I tried it out. I found out  
that Python uses UTF-16 internally.
But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
I can print that character to TM but unicodedata fails because it  
expects one character but not two (?)
Servus,
--der Hans

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [TxMt] New "Unicode" bundle in the Review trunk