[TxMt] New "Unicode" bundle in the Review trunk
bibiko at eva.mpg.de
Sun Jun 1 23:09:01 UTC 2008
On 02.06.2008, at 00:04, Walter Dörwald wrote:
> Here's another patch (against the current version). It shows both
> the codepoint and the name.
> BTW, you don't have to use a regular expression to split a string
> into characters, simply iterating through it does the trick:
> Index: Commands/Show Unicode Names.tmCommand
> -for a in re.compile("(?um)(.)").split(unicode(sys.stdin.read(),
> - if (len(a)==1) and (a != '\n'):
> - res = a + " : " + unicodedata.name(a, "U+%04X" % ord(a))
> +for a in unicode(sys.stdin.read(), "UTF-8"):
> + if a != '\n':
> + res = u"%s : U+%04X" % (a, ord(a))
> + name = unicodedata.name(a, None)
> + if name:
> + res += u" : %s" % name
> print res.encode("UTF-8")</string>
Thanks! Just committed to the trunk.
> >> Furthermore it would be great if this script could display all
> >> information there is in the Python Unicode database, i.e. stuff
> >> unicodedata.category()
> >> unicodedata.bidrectional()
> >> unicodedata.decimal()
> > Yes. I have such a script in Perl which also shows up info about
> > code points etc.
Just added to the bundle a prototype of 'Show Unicode Properties'
> Another problem: Using Ctrl-Shift-U as the shortcut hides the
> "Convert To Lowercase" command.
Yes. This was a bad key combo. I changed it temporally to CTRL+OPT
BTW: Can Python handle Unicode codepoints which are specified in
Unicode pane B, meaning greater U+FFFF? I tried it out. I found out
that Python uses UTF-16 internally.
But e.g. UCS hex: 20000 ; UTF-16: D840 DC00 .
I can print that character to TM but unicodedata fails because it
expects one character but not two (?)
More information about the textmate