[TxMt] TM tokenizer is taking 100% CPU for long while using 100-200KB text files with single line
Adam Strzelecki
ono at java.pl
Wed Feb 27 14:59:06 UTC 2008
Hello,
> But in TextMate the syntax highlighter (and more) is line-based and
> works with regular expressions and not a precompiled lexer/parser, so,
> yes, the line length does matter.
I thought length doesn't matter ;)
Anyway I'm not convinced :) I agree that precompiled lexer/parser is
simply faster, however I don't see the point that regexp tokenizer
works 1000x slower on file that has single 200'000 characters line,
rather than 20000 x 10 character lines (while both files are exactly
same size,... ouch, faster one is slightly bigger because of extra
\n ;P)
I hope TM compiles its regular expressions just once, moreover execute
single regexp made from merge of all regexp, rather all regular
expressions separately.
With this belief and fact that compiled regexp is an automaton similar
to the one in pre-compiled parsers & lexers, line length shouldn't
IMHO matter.
So if it does, there's a place for optimization.
Cheers,
--
Adam Strzelecki |: nanoant.com :|
More information about the textmate
mailing list