[TxMt] UTF-8 line breaks

Hans-Joerg Bibiko bibiko at eva.mpg.de
Wed Oct 18 08:55:31 UTC 2006


> I've come across this a couple of times now: a file which looks  
> fine in TextEdit or BBEdit loses its line breaks when opened in  
> TextMate.
>
> Looking at the problem file in HexEdit, the following Hex is being  
> used for linebreaks: E2 80 A8, which appears to be a UTF-8  
> linebreak, based on some preliminary googling. Saving a short UTF-8  
> file in TM, the linebreaks are: 0A (\n).
>
> Would it be possible for TM to support this format of linebreaks,  
> at least on read?
>

This is a nice issue.

There are two Unicode specifications:
U+2028 utf-8 E280A8 : LINE SEPARATOR LS
U+2029 utf-8 E280A9 : PARAGRAPH SEPARATOR PS

With the advent of word processing software that wrap lines  
automatically, having a special character to signal the beginning of  
a new line became superfluous.
Complicating things even more, some programs allow explicit line  
divisions within a paragraph, because they already used new-line to  
signal a new paragraph, they had to invent a new character to signal  
a line break within a paragraph. (WORD does it with 0x0B)

To solve that problem, Unicode includes these two characters.

My suggestion would be, as Jeremy mentioned, that TM supports these  
characters while reading.

To allow the user to save such a document with these characters would  
be a problem, because many many programs don't know these characters.

Cheers,

Hans



More information about the textmate mailing list