When I try to open files that have extended characters (e.g. curly apostrophe) and are saved with "Western - Windows" encoding, TextMate 2 gives an "Unknown Encoding" error with a hex dump of the file and asks for the encoding. If I choose "Western - Windows", TextMate opens the file successfully. (The default encoding is set to the default, "Unicode - UTF-8".) Also, if I have a project containing these files, Find In Project fails because TextMate sees them as binary files.
Is there any way to have TextMate autodetect the encoding so that opening and searching files works as expected? Thanks,
Trevor
On 04/01/2012, at 07.45, Trevor Harmon wrote:
[…] Is there any way to have TextMate autodetect the encoding so that opening and searching files works as expected? Thanks,
The thing is, it’s impossible to do auto-detection with 100% certainty.
UTF-8 though comes very close to being 100% auto-detectable, so TM2 tests for that, and lets user do the hard work when it is not UTF-8 to remind users that they really should be using UTF-8.
What you can do is set a file’s encoding via Apple’s encoding attribute or via .tm_properties — but really, use UTF-8!
TM2 is likely to introduce some heuristics down the road and the current sheets during opening should also be considered provisional, still no excuse not to use UTF-8.
So bottom line: if you want to use cutting-edge alpha builds, don’t use legacy encodings from last century ;)
On Jan 3, 2012, at 7:24 PM, Allan Odgaard wrote:
On 04/01/2012, at 07.45, Trevor Harmon wrote:
[…] Is there any way to have TextMate autodetect the encoding so that opening and searching files works as expected? Thanks,
The thing is, it’s impossible to do auto-detection with 100% certainty.
UTF-8 though comes very close to being 100% auto-detectable, so TM2 tests for that, and lets user do the hard work when it is not UTF-8 to remind users that they really should be using UTF-8.
I realize encoding detection is a guessing game, but it looks like TM2 doesn't even try to guess. It simply gives up and bugs the user (when opening) or treats the file as an opaque binary (when searching). I work with a project that contains many non-UTF-8 files, so I'm constantly seeing the "Unknown Encoding" dialog, and the Find In Project command is less than useful.
What you can do is set a file’s encoding via Apple’s encoding attribute or via .tm_properties — but really, use UTF-8!
That was the first thing I tried. I put:
encoding = "Western - Windows"
in the project's .tm_properties file, but it seemed to have no effect.
I also changed the Encoding setting in the Preferences, but that too had no effect. (Does it only pertain to saving?)
So bottom line: if you want to use cutting-edge alpha builds, don’t use legacy encodings from last century ;)
In this case I am working with a team of Windows developers, so I'm inheriting a lot of Windows code with proprietary encoding.
Is there some way to force an encoding to be used if the UTF-8 detection fails? I think that would be a perfectly fine solution for me, if I could get it to work.
Thanks,
Trevor
On 04/01/2012, at 13.42, Trevor Harmon wrote:
[…] it looks like TM2 doesn't even try to guess
Correct — a bad guess can do more harm than good, and trust me, I have talked with plenty of users who managed to mess up their files by not knowing the importance of correcting a bad guess, so the current solution, while definitely provisional, I consider “the right direction” — i.e. make the user aware that he is walking on thin ice — but ideally the modal dialog sheet should just a top-bar a la Chrome’s “This page appears to be in Thai, do you want to translate it?”, and some heuristics to make a better default choice and possibly some options to make choices sticky.
All but UTF-8 being legacy though, this is polish that comes after more pressing matters ;)
[…] I put:
encoding = "Western - Windows"
in the project's .tm_properties file, but it seemed to have no effect.
See x-man-page://iconv_open/3 for possible encodings. For “Western — Windows” you need to use CP1252.
I also changed the Encoding setting in the Preferences, but that too had no effect. (Does it only pertain to saving?)
This is one of the settings that just has no effect, period.
[…] Is there some way to force an encoding to be used if the UTF-8 detection fails? I think that would be a perfectly fine solution for me, if I could get it to work.
Setting encoding in the .tm_properties file will fix it for regular file opens but the folder search code presently skip all this, so afraid here you will still see “binary file” and for now I advice strongly against using the folder search with anything but UTF-8 + LF.
On Jan 4, 2012, at 11:16 AM, Allan Odgaard wrote:
[…] I put:
encoding = "Western - Windows"
in the project's .tm_properties file, but it seemed to have no effect.
See x-man-page://iconv_open/3 for possible encodings. For “Western — Windows” you need to use CP1252.
Thanks, that worked. With an encoding specified in .tm_properties, I no longer get the encoding selection popup, which is a relief. I still see "(binary)" in the find window (as you pointed out), but I can live with that. Now I just need to convince the team I work with to get rid of Windows encodings once and for all...
Trevor