whoha!
I just wanted to do some xml-wrangling with TM and opened my usual accounting file which contains ca. 6000 lines of xml.
I work with it all the time and never experienced any problems. except for right now, when i did a search and replace with regular expressions, just simply remoing all line-breaks, ie. replacing '\n' with nothing.
TM has been maxing out the CPU for ten minutes now (Dual 1GHz G4) and doesn't let me do anything. I've got ca 15 projects open and am praying that they're saved.
How can this be avoided in the future? is TM really only usable for small files such as XHTML templates or code? that'd be a real shame...
can anybody shed any light on this or give some advice, what other product to use for the 'heavy lifting stuff'?
best regards,
tom
Tom Lazar wrote:
I work with it all the time and never experienced any problems. except for right now, when i did a search and replace with regular expressions, just simply remoing all line-breaks, ie. replacing '\n' with nothing.
Confirmed (on a circa 1500 line xml file: spinning ball for about 30 seconds for a regex replace of "\n" with ''). Seems to be somehow tied in with the fact that we're searching for "\n", as any other search, even with regex turned on, seems to be at least an order of magnitude quicker. For instance, I searched for "a" and replaced it with "<??>" (ie something unlikely to occur in the file) with regex turned on, and the query has four times as many matches and returns in a couple of seconds.
can anybody shed any light on this or give some advice, what other product to use for the 'heavy lifting stuff'?
Would you believe "TextMate"?
Text => Filter through command... => (Input: Document, Output: Replace Selection)
perl -pe 's/\n//g'
Pretty much instant. Undo is also virtually instant, should you not like what it's done to your document! While the "g" isn't strictly necessary in this instance it is in general if you're wanting to replace all occurrences.
For more generic regex substitutions just throw your "Find" and "Replace" regexes between the leaning matchsticks. For case insensitive matching you can use
perl -pe 's/\n//gi'
instead. Not ideal, but infinitely faster than the builtin (for now).
Cheers, Paul
Confirmed the slowness here as well. I thought it might be due to the fact that the substitution would result in very long lines, but I also tried a search and replace for "\n" replaced by itself, and that takes quite some time too. Interestingly enough, using the "sum" button to count the number of matches is also instant. Weird...
Haris
abt. 60% of the time is spent into oniguruma::find()… so I guess we both know which component is to blame. I was expecting that somehow TM's bad handling of long lines would get the blame, looks like I was wrong (time for a change :)
here are the results of my shark session -- it is the time profile of TM trying to replace \n with an underscore. (shark is included with the CHUD tools, available via ADC or on your xcode cd/dvd/whatever)
ciao,
domenico
On Jan 17, 2006, at 2:56, Domenico Carbotta wrote:
abt. 60% of the time is spent into oniguruma::find()… so I guess we both know which component is to blame. I was expecting that somehow TM's bad handling of long lines would get the blame, looks like I was wrong (time for a change :)
You could still be kind of right. If Allan is re-analyzing the first line of the file (for folding, syntax highlight or some other purpose) for each newline removed and if he is using onigurama for this then he is going to use onigurama in the order of n^2 times (n is the length of the file). That might explain the behavior. We'll see when he comes back from vacation :-)
On Jan 17, 2006, at 1:10 AM, Paul McCann wrote:
Would you believe "TextMate"?
Text => Filter through command... => (Input: Document, Output: Replace Selection)
perl -pe 's/\n//g'
Pretty much instant. Undo is also virtually instant, should you not like what it's done to your document! While the "g" isn't strictly necessary in this instance it is in general if you're wanting to replace all occurrences.
For more generic regex substitutions just throw your "Find" and "Replace" regexes between the leaning matchsticks. For case insensitive matching you can use
perl -pe 's/\n//gi'
instead. Not ideal, but infinitely faster than the builtin (for now).
thank you so much, paul! that was *exactly* what I needed to keep things moving over here! phew ;)
instinctively, i'd also say, that the bottleneck is somewhere inside the regex-engine, since the problem exists just as well, when I turn off all syntax coloring by chosing "text plain"
@allan: if this is really narrowed down to searching for \n in large files you might consider checking for that on submit of the search form and then utilize the perl command that paul suggested.
best regards,
tom
On 17/01/2006, at 10.18, Tom Lazar wrote:
instinctively, i'd also say, that the bottleneck is somewhere inside the regex-engine, since the problem exists just as well, when I turn off all syntax coloring by chosing "text plain"
You do not turn off syntax highlight by choosing "text plain". You just get the syntax highlighter for plain text. E.g. try to write "[NEW]" (should be highlighted bold and blue with the default theme).
But I'm not sure that the syntax highlighting is the problem. I just think Allan is doing something to the first line every time a line is appended to it. Even a simple linear search of the line would be a problem...
I strongly doubt that the regex-engine is to blame, but I'm sure Allan is going to enlighten us when he comes back :-)
On 17/01/2006, at 11:36, Benny Kjær Nielsen wrote:
But I'm not sure that the syntax highlighting is the problem. I just think Allan is doing something to the first line every time a line is appended to it. Even a simple linear search of the line would be a problem... I strongly doubt that the regex-engine is to blame, but I'm sure Allan is going to enlighten us when he comes back :-)
Yes for instance, if spell-checking is eanbled, lines more than 250ish chars long bring my machine down, so if you have that enabled, it could make a difference.
-- Sune.
Sounds like a classic case of the regexp engine doing backtracking.
The regexp engine searches forward until it matches the regexp. Then it begins searching backward from the end of the file, until the regexp matches as much as possible. Which in this case is only one character. This is called greedy matching.
So the first thing you can do is make the match lazy. Tell the regexp engine it's okay to do as little as possible. You do that by adding a ? after the repeat modifier.
\n+?
That says match 1 or more newlines, but don't knock yourself out.
The second thing you can do is make the regexp so the engine doesn't have to search backward.
\n[^\n]
So that says "match a newline followed by something that isn't a newline." When the regexp engine sees that it has matched the whole pattern to begin with, it stops. In experiments I just did, this was the fastest, even faster that the lazy modifier.
If you have blank lines in your file (\n followed by \n), you can use this regexp: \n+[^\n]
"Match one or more newlines, followed by a non-newline."
So it seems TextMate's regexp is working as it should.
Regards, JJ
On 16-Jan-2006, at 18:51, Tom Lazar wrote:
whoha!
I just wanted to do some xml-wrangling with TM and opened my usual accounting file which contains ca. 6000 lines of xml.
I work with it all the time and never experienced any problems. except for right now, when i did a search and replace with regular expressions, just simply remoing all line-breaks, ie. replacing '\n' with nothing.
TM has been maxing out the CPU for ten minutes now (Dual 1GHz G4) and doesn't let me do anything. I've got ca 15 projects open and am praying that they're saved.
How can this be avoided in the future? is TM really only usable for small files such as XHTML templates or code? that'd be a real shame...
can anybody shed any light on this or give some advice, what other product to use for the 'heavy lifting stuff'?
best regards,
tom
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
--- Help everyone. If you can't do that, then at least be nice.
On 18/01/2006, at 16:47, John Johnson wrote:
Sounds like a classic case of the regexp engine doing backtracking.
Yes, but then again his simple regex, replacing \n, doesn't have any + or *, so it shouldn't be a problem.
The regexp engine searches forward until it matches the regexp. Then it begins searching backward from the end of the file, until the regexp matches as much as possible.
I'm pretty sure it doesn't do this, but searches forward until it found the largest match. But regardles of that, in this case there is only one character to match.
-- Sune.
Nevertheless, the workarounds to eliminate backtracking that I gave sped up the search and replace dramatically.
Regards, JJ
On 19-Jan-2006, at 18:23, Sune Foldager wrote:
On 18/01/2006, at 16:47, John Johnson wrote:
Sounds like a classic case of the regexp engine doing backtracking.
Yes, but then again his simple regex, replacing \n, doesn't have any + or *, so it shouldn't be a problem.
The regexp engine searches forward until it matches the regexp. Then it begins searching backward from the end of the file, until the regexp matches as much as possible.
I'm pretty sure it doesn't do this, but searches forward until it found the largest match. But regardles of that, in this case there is only one character to match.
-- Sune.
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
--- Help everyone. If you can't do that, then at least be nice.
On 17/1/2006, at 0:51, Tom Lazar wrote:
[...] I work with it all the time and never experienced any problems. except for right now, when i did a search and replace with regular expressions, just simply remoing all line-breaks, ie. replacing '\n' with nothing.
Replacements which merge lines are slower than they ought to be. I haven't looked closer into the problem, but there likely is one.
I use the Filter Through Command… solution myself for these situations ;)