[TxMt] New "Unicode" bundle in the Review trunk

Hans-Jörg Bibiko bibiko at eva.mpg.de
Sun Jun 1 23:09:01 UTC 2008


On 02.06.2008, at 00:04, Walter Dörwald wrote:
> Here's another patch (against the current version). It shows both  
> the codepoint and the name.
>
> BTW, you don't have to use a regular expression to split a string  
> into characters, simply iterating through it does the trick:
>
> Index: Commands/Show Unicode Names.tmCommand
> -for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(),  
> "UTF-8")):
> -     if (len(a)==1) and (a != '\n'):
> -          res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
> +for a in unicode(sys.stdin.read(), "UTF-8"):
> +     if a != '\n':
> +          res = u"%s : U+%04X" % (a, ord(a))
> +          name = unicodedata.name(a, None)
> +          if name:
> +              res += u" : %s" % name
>            print res.encode("UTF-8")</string>
>  	<key>fallbackInput</key>
>  	<string>character</string>
Thanks! Just committed to the trunk.

> >> Furthermore it would be great if this script could display all
> >> information there is in the Python Unicode database, i.e. stuff  
> like
> >>
> >>    unicodedata.category()
> >>    unicodedata.bidrectional()
> >>    unicodedata.decimal()
> > Yes. I have such a script in Perl which also shows up info about  
> Unicode
> > code points etc.
Just added to the bundle a prototype of 'Show Unicode Properties'


> Another problem: Using Ctrl-Shift-U as the shortcut hides the  
> "Convert To Lowercase" command.
Yes. This was a bad key combo. I changed it temporally to CTRL+OPT 
+APPLE+U

BTW: Can Python handle Unicode codepoints which are specified in  
Unicode pane B, meaning greater U+FFFF? I tried it out. I found out  
that Python uses UTF-16 internally.
But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
I can print that character to TM but unicodedata fails because it  
expects one character but not two (?)

Servus,

--der Hans


More information about the textmate mailing list