[SVN] Improving the Ruby Syntax

Allan Odgaard allan at macromates.com
Mon Mar 7 20:37:34 UTC 2005


On Mar 7, 2005, at 20:55, Chris Thomas wrote:

> I say go for it. It's always better to classify things specifically 
> where possible. Whether or not the default style sheet colors all of 
> the keyword elements the same is a different question, and it probably 
> doesn't matter, because the stylesheets will allow full per-user 
> customization.

Just so you guys know, to name captures one would do e.g.:

    name = "keywords.functions.method-with-arguments.ruby";
    match = "^\\s*(def\\>)\\s*([.a-zA-Z_?!]+)\\s*\\((.*)\\)";
    captures = {
       2 = { name = "function-name.ruby"; };
       3 = { name = "function-arguments.ruby"; };
    };

For begin/end there's beginCaptures and endCaptures to name only 
captures in the begin or end match. The path will have the name after 
the name of the entire match. E.g. the function name will have the full 
path:

    source.ruby keywords.functions.method-with-arguments.ruby 
function-name.ruby

If captures are nested, like:

    name = "test";
    match = "(foo(bar))";
    captures = (
       1 = { name = "foobar"; };
       2 = { name = "bar"; };
    );

Then the bar part will have this path:

    test foobar bar

I made the values of the captures arrays be dictionaries with a name 
key mainly to make it easier for me to handle (so I don't need special 
code for captures). And it does allow to add more info to captures in 
the future if it should ever be needed, but I guess it is a little 
redundant...

And Eric, I actually have heredocs working in my current version :) 
Though the rule I had to make to match heredocts is a little special:

    name = "comments.heredoc.shell";
    begin = "(?=<<(\\w+))"; end = "^\\1";
    patterns = (
       {  begin = "^<<\\w+"; end = "$";
          patterns = ( { include = "source.shell"; } );
       }
    );

What it does is it makes the begin pattern only a look-ahead assertion 
on the delimiter. That way, the delimiter is not eaten when arriving at 
the sub-patterns, so I made one sub-pattern that also matches the 
delimiter with end set to end-of-line ($) and this rule has the entire 
shell syntax as sub-patterns, so basically, after the actual 
<<DELIMITER there will be normal shell-highlight till end-of-line.

In practice this isn't perfect, and it still doesn't handle nested 
heredocs (actually it does, but in the reverse order), but I think this 
will cover 99% of the situations arising in code.

I made the end pattern able to refer to captures in the begin pattern, 
but this means that the end pattern itself cannot refer to its own 
captures. The reason for this choice is both technical and practical. 
E.g. naming captures in the end pattern would need to take number of 
captures in the begin pattern into account. And currently one cannot 
use captures in conditions, which might actually be useful, e.g. 
conditionally match the dash in front of the delimiter in the begin 
pattern, and allow leading tabs in the end pattern if it was matched 
(so for now we'd need two rules, one with and one without the dash).

Also, my example also doesn't allow the delimiter to be quoted (as it's 
just an example).

Oh, and Chris, as you can probably see from the patterns above, there 
are no longer problems with zero-width matches, so it's not a problem 
to match multi-line preprocessor instructions in C.

I actually currently also run the patterns on the entire source, rather 
than one line at a time, but I'm not 100% sure I'll continue to do this 
-- the problem mostly has to do with having to resume parsing in the 
middle of a source, if multi-line matches are allowed, it would mean 
that I couldn't be sure that any given line was a safe starting point 
(since a change in line n may affect a match starting at line n-i (for 
n, i >= 0)).

> And the Rails stuff in particular definitely needs to be in a separate 
> Rails syntax.

Amen! :)




More information about the textmate-dev mailing list