Defining a regular expression looking like a grammar - TextMate

17 Oct 2018


      I recently read this post [1] where it shows how to define a regular expression (using the PCRE) syntax that looks very much like a proper grammar. A reduced example for the post looks like this:
/
    (?(DEFINE)
        (?<addr_spec> (?&local_part) @ (?&domain) )
        (?<local_part> (?&dot_atom) | (?&quoted_string) | (?&obs_local_part) )
        (?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
    )
    ^(?&addr_spec)$
/x
The three capture groups “addr_spec”, “local_part” and “domain” would be the grammar rules. It uses the (?&name) syntax to refer to another subgroup. TextMate  does not support that syntax but supports the following syntax: \g<name>, which the documentation refers to as Subexp call [2]. This syntax seems to have the same semantics. (DEFINE) is something that seems to be PCRE specific and basically means that the following patterns will not be tried to match. It basically gives a place to define subpatterns. I didn’t find anything corresponding in the TextMate regular expression syntax but defining an optional group can be used as a workaround.
Here’s an example where I tried this technique to match a module declaration in the D language:
(?:
  (?<module_declaration>(?<module>module)\s+\g<module_fully_qualified_name>\s*;)
  (?<module_fully_qualified_name>\g<module_name>|\g<packages>.\g<module_name>)
  (?<module_name>\g<identifier>)
  (?<packages>\g<package_name>|\g<package_name>.\g<packages>)
  (?<package_name>\g<identifier>)
  (?<identifier>\w+)
)?
\g<module_declaration>
This is exactly according to the specified grammar [3] and it seems to be working as expected. Not sure if the optional group workaround causes some performance implications.
This technique seems like it could be a viable alternative to supporting variables in the TextMate grammar as has been discussed before. What’s missing from this to make it really useful would be something like (DEFINE) in PCRE and a place in the TextMate grammar to place generic patterns used in multiple rules, like a pattern for identifiers.
[1] https://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.htm... https://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html
[2] https://macromates.com/manual/en/regular_expressions https://macromates.com/manual/en/regular_expressions
[3] https://dlang.org/spec/grammar.html#ModuleDeclaration
-- 
/Jacob Carlborg