[TxMt] RegExp n00b

Allan Odgaard allan at macromates.com
Tue Sep 13 12:52:42 UTC 2005


On 13/09/2005, at 14.27, Andreas Wahlin wrote:

> I've been trying some time now with the javascript bundle, I get  
> almost everything after your little help there Allan :)

Good to hear!

>     foldingStartMarker = "^\\s*([A-Za-z0-9.]+s*=\\s*)?(function)\\b";
>
>    what does the = sign mean?

That's a literal match, so no special meaning.

> Does \\b mean ending bracket?

No, it's a word boundary. Basically meaning that the next character  
needs to be a non-word character (since the previous was a word  
character).

This is required because if e.g. we want to match the start of a bold  
tag, we'd do:
  <b
but that would also match <body or anything else starting with b, so  
instead we do:
  <b\\b
The \\b isn't matching any characters per se, but is an assertion.

> And why isn't the s there escaped, or should it match the letter s  
> how many times you want (considering the * after it)?

It's definitely a bug, should have been escaped :)

>     match = "^\\s*(function)\\s*([a-zA-Z_]\\w*)\\s*\\(([^)]*)\\)";
>
> This one I get almost completely, except the ([^)]*) part. My only  
> guess is that it means something like how many )'s you want at the  
> end of the string or something, but that hardly seems necessary.

The brackets can contain single characters instead of ranges.

[)] will match ), so [^)] will match anything but ). I.e. [^)]*  
matches up till the first ). Since we match the actual ), we could  
also have done:

     match = "^\\s*(function)\\s*([a-zA-Z_]\\w*)\\s*\\((.*?)\\)";

So given: “function foo (...)” it matches the ... part, and the ...  
part is not allowed to contain any )'s.

> Also, is it the matching of meta.function.js that dictates matches  
> in command+shift+t (go to symbol)?

Partially, yes.

If you look at the rule, you'll notice it has:
     captures = {
         1 = { name = "storage.type.function.js"; };
         2 = { name = "entity.name.function.js"; };
         3 = { name = "variable.parameter.function.js"; };
     };

These are assigning names to the 3 captures in the regexp (i.e. the  
function keyword, the actual name of the function, and the ... part  
in parentheses).

If you place the caret on each of these parts (in a javascript  
source) and press ctrl-shift P, you'll be able to verify this.

Now if you go to the Source bundle (in the Bundle Editor) and look at  
the Symbol List preferences item, the actual preference is:
     { showInSymbolList = "1"; }

And the scope selector of that item is:
     entity.name.function, meta.toc-list

This means that every scope selected by that scope selector should  
have the showInSymbolList enabled. This is what causes stuff marked  
up as entity.name.function in javascript to appear in the popup list.

If you look in the CSS bundle or HTML bundle, you'll see that there  
are additional preference items to place CSS selectors and HTML id  
arguments in the symbol list as well (since these are not matched by  
the scope selector above). So the entity.name.function name is only a  
convention -- everything can go in the popup :)





More information about the textmate mailing list