[TxMt] UTF-8 line breaks
Hans-Joerg Bibiko
bibiko at eva.mpg.de
Wed Oct 18 08:55:31 UTC 2006
> I've come across this a couple of times now: a file which looks
> fine in TextEdit or BBEdit loses its line breaks when opened in
> TextMate.
>
> Looking at the problem file in HexEdit, the following Hex is being
> used for linebreaks: E2 80 A8, which appears to be a UTF-8
> linebreak, based on some preliminary googling. Saving a short UTF-8
> file in TM, the linebreaks are: 0A (\n).
>
> Would it be possible for TM to support this format of linebreaks,
> at least on read?
>
This is a nice issue.
There are two Unicode specifications:
U+2028 utf-8 E280A8 : LINE SEPARATOR LS
U+2029 utf-8 E280A9 : PARAGRAPH SEPARATOR PS
With the advent of word processing software that wrap lines
automatically, having a special character to signal the beginning of
a new line became superfluous.
Complicating things even more, some programs allow explicit line
divisions within a paragraph, because they already used new-line to
signal a new paragraph, they had to invent a new character to signal
a line break within a paragraph. (WORD does it with 0x0B)
To solve that problem, Unicode includes these two characters.
My suggestion would be, as Jeremy mentioned, that TM supports these
characters while reading.
To allow the user to save such a document with these characters would
be a problem, because many many programs don't know these characters.
Cheers,
Hans
More information about the textmate
mailing list