[TxMt] New "Unicode" bundle in the Review trunk

Walter Dörwald walter at livinglogic.de
Tue Jun 3 15:28:00 UTC 2008


Walter Dörwald wrote:

> Hans-Jörg Bibiko wrote:
> 
>> On 02.06.2008, at 00:04, Walter Dörwald wrote:
>>> Here's another patch (against the current version). It shows both the 
>>> codepoint and the name.
>>> [...]

Here's another suggestions on the current Bundle version:

To get the UTF-8 bytes of a character, you're doing the following:

     print "  UTF-8         : " + " 
".join(repr(char.encode("UTF-8")).split('\\x')).lstrip("' 
").rstrip("'").upper()

This only works for characters with a codepoint >= 128. The following 
code should work better:

     print "  UTF-8         : %s" % " ".join(hex(ord(c))[2:].upper() for 
c in char)

Furthermore the code:

    decomp = unicodedata.decomposition(char).lstrip(' ').rstrip(' ')

can be simplyfied to:

    decomp = unicodedata.decomposition(char).strip()

(strip() strips from both ends and stripping all whitespace is the 
default when no argument is given.)

Hope that helps.

Servus,
    Walter




More information about the textmate mailing list