[TxMt] TM tokenizer is taking 100% CPU for long while using 100-200KB text files with single line

Adam Strzelecki ono at java.pl
Wed Feb 27 14:59:06 UTC 2008


Hello,

> But in TextMate the syntax highlighter (and more) is line-based and
> works with regular expressions and not a precompiled lexer/parser, so,
> yes, the line length does matter.

I thought length doesn't matter ;)

Anyway I'm not convinced :) I agree that precompiled lexer/parser is  
simply faster, however I don't see the point that regexp tokenizer  
works 1000x slower on file that has single 200'000 characters line,  
rather than 20000 x 10 character lines (while both files are exactly  
same size,... ouch, faster one is slightly bigger because of extra  
\n ;P)

I hope TM compiles its regular expressions just once, moreover execute  
single regexp made from merge of all regexp, rather all regular  
expressions separately.
With this belief and fact that compiled regexp is an automaton similar  
to the one in pre-compiled parsers & lexers, line length shouldn't  
IMHO matter.
So if it does, there's a place for optimization.

Cheers,
-- 
Adam Strzelecki |: nanoant.com :|




More information about the textmate mailing list