I'm trying to understand how the single line comment rule works (found on this line https://github.com/textmate/javascript.tmbundle/blob/master/Syntaxes/JavaScr...)
begin = '(^[ \t]+)?(?=//)'; end = '(?!\G)'; beginCaptures = { 1 = { name = 'punctuation.whitespace.comment.leading.js'; }; }; patterns = ( { name = 'comment.line.double-slash.js'; begin = '//'; end = '\n'; beginCaptures = { 0 = { name = 'punctuation.definition.comment.js'; }; }; }, );
Since TextMate grammars are line based, I'm not sure how it's possible for the '\n' pattern and the '(?!\G)' pattern to work together. For example, take these two lines:
// a comment var foo
The 'single line comment' rule is entered by matching it's begin pattern '(^[ \t]+)?(?=//)'. The 'comment.line.double-slash.js'' rule is then entered upon matching '//', the rule is ended when '\n' is matched. Because we have matched the end of the first line, we continue to the second line. The second line will end the 'single line comment' rule because the end pattern '(?!\G)' will match the 'a' in 'var'. Because of this, var cannot be scooped correctly because the 'v' is considered to be part of the 'single line comment' rule.
This fails as I described in TextMate 1 but works in TextMate 2. Is there something fundamentally different about using the \G anchor in TextMate 2.
Thanks, Corey
I have a followup question:
How is \G defined in TextMate grammars?
I notice from the source code that \G anchors are replaced with a null string if the start of the match isn't equal to `anchor`, but I'm having a hard time following where and why `anchor` is updated during the progress of the parse. How would you describe the meaning of `anchor`? Is it the end of the last match, or something else?
Thanks!
Have a read at this: http://www.regular-expressions.info/continue.html (and then this too: http://manual.macromates.com/en/regular_expressions#regular_expressions).
On 23 October 2012 08:03, Nathan Sobo nathan@github.com wrote:
I have a followup question:
How is \G defined in TextMate grammars?
I notice from the source code that \G anchors are replaced with a null string if the start of the match isn't equal to `anchor`, but I'm having a hard time following where and why `anchor` is updated during the progress of the parse. How would you describe the meaning of `anchor`? Is it the end of the last match, or something else?
Thanks!
textmate mailing list textmate@lists.macromates.com http://lists.macromates.com/listinfo/textmate
On Oct 22, 2012, at 9:11 PM, Corey Johnson cj@github.com wrote:
I'm trying to understand how the single line comment rule works […]
begin = '(^[ \t]+)?(?=//)';
end = '(?!\G)';
[…] Since TextMate grammars are line based, I'm not sure how it's possible for the '\n' pattern and the '(?!\G)' pattern to work together.
The regexp is matched aginst a single line, true, but that line will include it’s trailing newline (unless it’s the last line in the document).
[…] the end pattern '(?!\G)' will match the 'a' in 'var'.
Stuff wrapped in (?=…), (?!…), (?<=…), and (?<!…) are “look around” asserrions. They will peak at the character after or before the current matching position, but they will not consume the character.
So a rule like: { match = 'ba(?=r)'; } will match ‘ba’ when followed by ‘r’, but leave the ‘r’ to be potentially matched by a new rule.
[…] This fails as I described in TextMate 1 but works in TextMate 2. Is there something fundamentally different about using the \G anchor in TextMate 2.
Yes, in 1.x \G was undefined, in 2.0 it has a defined behavior, which brings us to Nathan Sobo’s question:
How is \G defined in TextMate grammars?
It should be the end of the previous match (as you suggest).
For a begin/end rule, the end rule’s ‘\G’ will match where the begin rule ended. In the line comment rule we use a negative look-ahead assertion (?!\G) requiring that the end rule is *not* where the begin rule stopped.
This is because the begin pattern can match zero characters (it matches optional leading whitespace and does a look-ahead on the two forward slashes, but does not match them), so in the case of no leading whitespace, it will not match any characters. In this case, we do not want the end rule to match immidiately, as that also match zero characters, and we would thus end up with an infinite loop.