[SVN] Regular Expression Language Grammar

Gerd Knops gerti-textmate at bitart.com
Wed Jun 14 21:08:23 UTC 2006


Allan et all,

to make it easier to 'parse' complex regular expressions, I am in the  
process of designing a Regular Expression Language Grammar. That  
seems to have a lot of potential! But I got a few questions before I  
release a first experimental version:

# Included language missing in scope #

The Regular Expression Language would be included with something like

	...
	include = 'source.regexp';
	...

That works, but when I look at the scope inside a regular expression  
(Shift-Ctrl-P), 'source.regexp' does not appear. That seems to be the  
case for all included languages, they do not appear in the scope,  
only names defined in those languages appear. Is that an oversight?


# Conditional pattern matches #

Since most programming languages use very similar regular  
expressions, this language would be a candidate for inclusion in a  
number of languages. However most languages add their own quirks to  
regular expressions (eg variables). These would have to be listed at  
strategic locations inside the regular expression grammar, otherwise  
we end up copying the entire RegExp grammar for all these languages  
and adding the exceptions.

So it would be great if there were conditional pattern matches,  
something along the lines of

	scope_contains = ( 'source.perl', 'source.ruby' );

or the invers

	scope_contains_not = ('source.perl');


# Names and coloring #

Currently I defined a 'private' namespace for the regular  
expressions, with names like

	string.regexp.escaped_char.newline
	string.regexp.posix_bracket.alnum
	string.regexp.quantifier.greedy.0_up_to_n

Downside is that lots of new colors would have to be defined in the  
themes to make use of this. So I wonder if I should be using things like

	string.newline
	string.octal
	keyword.operator

On the other hand while going through some more complex regex it is  
great to do Ctrl-Shift-P and see 'string.regexp.quantifier.reluctant. 
1_or_more' or some such to explain what is happening at that point in  
the regex.

Any suggestions?


# Include and Match/Captures #

Sometimes there are constructs where a match would be much better  
suited than begin and end, but I want to include something. A (not  
quite correct but you get the idea) example:

	(red|green)

I wish I could write a pattern as follows:

	match = '\((.+)\|(.+)\)';
	name = 'string.regexp.alternation';
	captures =
	{	1 = { include = '$self'; };
		2 = { include = '$self'; };
	};

Is there any workaround for patterns like these?


Thanks

Gerd




More information about the textmate-dev mailing list