We've discovered (naturally in an unpleasant way) that TM (1.5.10) will translate CR (0x0d) in a file to LF (0x0a) on save - even when the Line Endings setting "Use for existing files as well" is not checked. This seems like a bug, as it seems the settings are to leave existing files alone with respect to potential line ending characters. Is this a known issue?
Thanks much,
-eric
On 2012-01-06 13:15, Eric Hall wrote:
We've discovered (naturally in an unpleasant way) that TM (1.5.10) will translate CR (0x0d) in a file to LF (0x0a) on save - even when the Line Endings setting "Use for existing files as well" is not checked. This seems like a bug, as it seems the settings are to leave existing files alone with respect to potential line ending characters. Is this a known issue?
I don't see that behavior here. To demonstrate, I create a short file with CR terminators. Dump with 'od' to see that the terminators are plain CRs.
macduff:/tmp$ od -a foo.txt 0000000 f o o cr b a r cr 0000010
Edit and save that file in TextMate. Dump again. Terminators are still CR.
macduff:/tmp$ od -a foo.txt 0000000 l o r e m cr f o o cr b a r cr i p 0000020 s u m cr 0000024
I can change the file to LF or CRLF terminators, but only if I explicitly tell TextMate to do so in the "Save As..." dialog.
macduff:/tmp$ od -a foo.txt 0000000 l o r e m cr nl f o o cr nl b a r cr 0000020 nl i p s u m 0000026
Is it possible that your original file started with mixed line endings? In that case I think TM will either save the file with your default line ending rather than trying to preserve the inconsistencies. Here's a file with mixed line endings; first CRLF, then CR, then LF.
macduff:/tmp$ od -a foo.txt 0000000 l o r e m cr nl f o o cr b a r nl i 0000020 p s u m 0000024
Edit it in TextMate to add one line and save. Now line endings are all LF.
macduff:/tmp$ od -a foo.txt 0000000 l o r e m nl f o o nl b a r nl i p 0000020 s u m nl d o l o r 0000031
On Fri, Jan 06, 2012 at 03:03:12PM -0500, Steve King wrote:
On 2012-01-06 13:15, Eric Hall wrote:
We've discovered (naturally in an unpleasant way) that TM (1.5.10) will translate CR (0x0d) in a file to LF (0x0a) on save - even when the Line Endings setting "Use for existing files as well" is not checked. This seems like a bug, as it seems the settings are to leave existing files alone with respect to potential line ending characters. Is this a known issue?
[snip]
Is it possible that your original file started with mixed line endings? In that case I think TM will either save the file with your default line ending rather than trying to preserve the inconsistencies. Here's a file with mixed line endings; first CRLF, then CR, then LF.
Yes, this is the case here (sorry, I'd meant to mention that) - we have files with LF line endings, a few wound up with CRs in the lines due to the data involved. Thus we've got text files with LF line endings and a CR (one or more) in the middle of some of the lines.
My simple example (note the cr after the colon):
% od -a CRtoNL.txt 0000000 T h i s sp i s sp t h e sp t e s t 0000020 : cr sp sp a n d sp s h o u l d sp a 0000040 l l sp b e sp o n e sp l i n e nl 0000057
Open the above with TextMate and save (no need to add another line):
% od -a CRtoNLTMSaved.txt 0000000 T h i s sp i s sp t h e sp t e s t 0000020 : nl sp sp a n d sp s h o u l d sp a 0000040 l l sp b e sp o n e sp l i n e nl 0000057
-eric
On 2012-01-06 15:25, Eric Hall wrote:
Yes, this is the case here (sorry, I'd meant to mention that) - we have files with LF line endings, a few wound up with CRs in the lines due to the data involved. Thus we've got text files with LF line endings and a CR (one or more) in the middle of some of the lines.
Ah, I see. The CR isn't actually a line terminator here, you want it treated as just another character. I'm not sure I'd call this a bug; rather, it's something that's outside of TextMate's scope. As it's first and foremost a *text* editor, Allan probably decided that the use case of accidentally-messed-up line endings would trump the case of actually wanting to preserve carriage returns. A pity it doesn't have something like Vim's binary (-b) mode, though.
On Fri, Jan 06, 2012 at 04:16:45PM -0500, Steve King wrote:
On 2012-01-06 15:25, Eric Hall wrote:
Yes, this is the case here (sorry, I'd meant to mention that) - we have files with LF line endings, a few wound up with CRs in the lines due to the data involved. Thus we've got text files with LF line endings and a CR (one or more) in the middle of some of the lines.
Ah, I see. The CR isn't actually a line terminator here, you want it treated as just another character. I'm not sure I'd call this a bug; rather, it's something that's outside of TextMate's scope. As it's first and foremost a *text* editor, Allan probably decided that the use case of accidentally-messed-up line endings would trump the case of actually wanting to preserve carriage returns. A pity it doesn't have something like Vim's binary (-b) mode, though.
If it wasn't for the 'Use for existing files as well' preference, I'd fully agree. Because of that setting, (which I have unchecked), I'm thinking TextMate should not alter line terminating characters (unless, of course, that setting is on).
-eric
On 07/01/2012, at 04.25, Eric Hall wrote:
[…] I'm thinking TextMate should not alter line terminating characters
The problem is that there are 3 ways a file can be line-terminated (ignoring unicode and messed up files):
* CR * LF * CRLF
Ideally TM would see what was used during load, and then when you create new lines, insert that into the file — but as you can imagine, that would make a lot of code far more complex, if it had to “abstract” identifying and inserting of newlines while working with text, so instead TM will convert the newlines to LF on load and back to whatever was used on save, and all code can then safely assume buffer uses LF.
Initially TM would test the newlines used and then only convert those, but a few users then had issues with stray CR or CRLF which were skipped — so it was deemed better to just convert all types to LF and convert back to the one with highest frequency.
2.0 does a slightly better job, e.g. running in Terminal: `printf 'foo\r\nbar\n'|mate` results in a literal <CR> shown. There’s still some room for improvement (and will probably eventually throw up a warning if it looks like the file is “messed up”).