[TxMt] New "Unicode" bundle in the Review trunk
Walter Dörwald
walter at livinglogic.de
Tue Jun 3 15:28:00 UTC 2008
Walter Dörwald wrote:
> Hans-Jörg Bibiko wrote:
>
>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>> Here's another patch (against the current version). It shows both the
>>> codepoint and the name.
>>> [...]
Here's another suggestions on the current Bundle version:
To get the UTF-8 bytes of a character, you're doing the following:
print " UTF-8 : " + "
".join(repr(char.encode("UTF-8")).split('\\x')).lstrip("'
").rstrip("'").upper()
This only works for characters with a codepoint >= 128. The following
code should work better:
print " UTF-8 : %s" % " ".join(hex(ord(c))[2:].upper() for
c in char)
Furthermore the code:
decomp = unicodedata.decomposition(char).lstrip(' ').rstrip(' ')
can be simplyfied to:
decomp = unicodedata.decomposition(char).strip()
(strip() strips from both ends and stripping all whitespace is the
default when no argument is given.)
Hope that helps.
Servus,
Walter
More information about the textmate
mailing list