On 27/5/2006, at 1:54, Timothy Reaves wrote:
[...] It is still BUILT upon RegEx, and runs a RegEx engine, and therefore, is going to be much slower
FYI roughly half the time spent by the parser is spent in CoreFoundation because I currently use NSArray and NSDictionary to represent the grammar. So regular expressions are not the culprits you try to make them out to be.
But when speaking of the parser and other text editors (that you bring up): the performance difference is a) psychological because TextMate does not redraw the first page of the document before the entire document has been parsed b) because TextMate does something very different compared to the editors you mention and c) it generally has more complex grammars (try put PHP in an HTML here doc in a ruby script embedded in Erb tags in an HTML file -- unlike other text editors, TextMate handles this gracefully.)
As Jonathan said, this is a parser. It parses your document into a tree and maintains that tree when you edit the text. This tree preserves information obtained during parsing, and styling is done using a style sheet with a selector based rule system which works on this tree.
So your comparison with other editors makes little sense. TextMate has a powerful declarative grammar system which allows the user to describe how his code should be parsed, assign functionality and apply preferences to user-defined semantic units based on this, coincidentally this system is also used for styling. It is the performance of this system you seem to be criticizing, stating that other editors are much faster -- but other editors do not do this, they don’t do anything like it! Even ignoring this sophisticated architecture for rule based behavior adjustments and focusing alone on the syntax highlight: other editors are still not as sophisticated, case in point: John Gruber (creator of Markdown) worked for one of the companies which product you mentioned, still that product is not capable of providing syntax highlight for his Markdown notation.
As for switching to a more API-based system: No, I am not doing that (at least not anytime soon) because this has a major flexibility and maintenance drawback.
The current system has been incrementally improved with new features, this has been possible exactly because the system is declarative. For language grammars, there is just one parser which use the declarations, instead of a dozen plug-ins each with their own code -- having separate plug-ins makes mixing grammars rather problematic (how TM supports embedded languages), and take something like the planned grammar injection based on scope selectors: that would certainly not have been possible in an API-based system without having to rewrite all existing plug-ins -- and even coming up with a proper API for this is quite a challenge.
So while I will gladly share my plans of improving this system in the future, it requires that you actually understand the system.
All that said, the problems described by the OP are primarily caused by other things than the language parser. I do not view the language parser as something which badly affects TextMate’s performance -- there are half a dozen other things which do affect the performance of TextMate in a bad way (i.e. scales badly, the language parser scales just fine [1]), and I will gradually improve these things. The language parser however will see improvements mainly in functionality.
[1] An exception being that editing a line requires the line to have been parsed before the key can be resolved (due to dependency on scope for key equivalents), and editing a line causes the entire line to be re-parsed, making editing a line linear in time with its length (where the factor here is parsing a line) -- this is generally not a problem, because I also redraw the entire line, and that is a much larger factor, but an exception is when enabling soft wrap, where then only the logical line is redrawn, but the physical line is re- parsed.