[TxMt] UTF-8 BOM problem

Allan Odgaard throw-away-1 at macromates.com
Thu Sep 28 19:33:45 UTC 2006


On 28/9/2006, at 13:44, Hans-Joerg Bibiko wrote:

> [...] But I don't know, if it would be too difficult to implement  
> this within TextMate. I could image that some users doesn't know  
> the issue of BOM.

I do indeed think that a BOM in UTF-8 files is misguided at best [1]  
(which is why you can’t enable it in TM).

The reason TM will preserve it is only that some users actually do  
rely on them [2] -- so to preserve it seemed like the best compromise  
between not wanting to endorse them or even acknowledge that BOMs  
have a valid role in UTF-8 files, but at the same time not screw up  
files where the user went out of his way to place a BOM there.

Though I realize there is a problem for users who get their hands on  
BOM infested files and want an easy way to cleanse them. I am leaning  
toward bringing up a “warning” when encountering UTF-8 files with a  
BOM, which would then allow the user to select to get rid of it.



[1] UTF-8 exists so that files can use unicode but still be backwards  
compatible, a BOM does destroy some of this backward compatibility,  
e.g. the shell won’t pickup the shebang line of a script if it has a  
BOM, and e.g. grep’ing through multiple files with BOMs will produce  
“bad” output if something on the first line of one of the files is  
matched.

[2] The users who I have spoken with which rely on them do so only  
because they do not send the proper encoding in the http headers and  
then assume that the user agent will still treat the received file as  
UTF-8 on the sight of the BOM.





More information about the textmate mailing list