On 28/9/2006, at 13:44, Hans-Joerg Bibiko wrote:
[...] But I don't know, if it would be too difficult to implement this within TextMate. I could image that some users doesn't know the issue of BOM.
I do indeed think that a BOM in UTF-8 files is misguided at best [1] (which is why you can’t enable it in TM).
The reason TM will preserve it is only that some users actually do rely on them [2] -- so to preserve it seemed like the best compromise between not wanting to endorse them or even acknowledge that BOMs have a valid role in UTF-8 files, but at the same time not screw up files where the user went out of his way to place a BOM there.
Though I realize there is a problem for users who get their hands on BOM infested files and want an easy way to cleanse them. I am leaning toward bringing up a “warning” when encountering UTF-8 files with a BOM, which would then allow the user to select to get rid of it.
[1] UTF-8 exists so that files can use unicode but still be backwards compatible, a BOM does destroy some of this backward compatibility, e.g. the shell won’t pickup the shebang line of a script if it has a BOM, and e.g. grep’ing through multiple files with BOMs will produce “bad” output if something on the first line of one of the files is matched.
[2] The users who I have spoken with which rely on them do so only because they do not send the proper encoding in the http headers and then assume that the user agent will still treat the received file as UTF-8 on the sight of the BOM.