[TxMt] UTF-8 BOM problem
Allan Odgaard
throw-away-1 at macromates.com
Thu Sep 28 19:33:45 UTC 2006
On 28/9/2006, at 13:44, Hans-Joerg Bibiko wrote:
> [...] But I don't know, if it would be too difficult to implement
> this within TextMate. I could image that some users doesn't know
> the issue of BOM.
I do indeed think that a BOM in UTF-8 files is misguided at best [1]
(which is why you can’t enable it in TM).
The reason TM will preserve it is only that some users actually do
rely on them [2] -- so to preserve it seemed like the best compromise
between not wanting to endorse them or even acknowledge that BOMs
have a valid role in UTF-8 files, but at the same time not screw up
files where the user went out of his way to place a BOM there.
Though I realize there is a problem for users who get their hands on
BOM infested files and want an easy way to cleanse them. I am leaning
toward bringing up a “warning” when encountering UTF-8 files with a
BOM, which would then allow the user to select to get rid of it.
[1] UTF-8 exists so that files can use unicode but still be backwards
compatible, a BOM does destroy some of this backward compatibility,
e.g. the shell won’t pickup the shebang line of a script if it has a
BOM, and e.g. grep’ing through multiple files with BOMs will produce
“bad” output if something on the first line of one of the files is
matched.
[2] The users who I have spoken with which rely on them do so only
because they do not send the proper encoding in the http headers and
then assume that the user agent will still treat the received file as
UTF-8 on the sight of the BOM.
More information about the textmate
mailing list