On Oct 22, 2012, at 9:11 PM, Corey Johnson cj@github.com wrote:
I'm trying to understand how the single line comment rule works […]
begin = '(^[ \t]+)?(?=//)';
end = '(?!\G)';
[…] Since TextMate grammars are line based, I'm not sure how it's possible for the '\n' pattern and the '(?!\G)' pattern to work together.
The regexp is matched aginst a single line, true, but that line will include it’s trailing newline (unless it’s the last line in the document).
[…] the end pattern '(?!\G)' will match the 'a' in 'var'.
Stuff wrapped in (?=…), (?!…), (?<=…), and (?<!…) are “look around” asserrions. They will peak at the character after or before the current matching position, but they will not consume the character.
So a rule like: { match = 'ba(?=r)'; } will match ‘ba’ when followed by ‘r’, but leave the ‘r’ to be potentially matched by a new rule.
[…] This fails as I described in TextMate 1 but works in TextMate 2. Is there something fundamentally different about using the \G anchor in TextMate 2.
Yes, in 1.x \G was undefined, in 2.0 it has a defined behavior, which brings us to Nathan Sobo’s question:
How is \G defined in TextMate grammars?
It should be the end of the previous match (as you suggest).
For a begin/end rule, the end rule’s ‘\G’ will match where the begin rule ended. In the line comment rule we use a negative look-ahead assertion (?!\G) requiring that the end rule is *not* where the begin rule stopped.
This is because the begin pattern can match zero characters (it matches optional leading whitespace and does a look-ahead on the two forward slashes, but does not match them), so in the case of no leading whitespace, it will not match any characters. In this case, we do not want the end rule to match immidiately, as that also match zero characters, and we would thus end up with an infinite loop.