possible format strings bug

List overview All Threads
Download

newer

older

No output from python script for...

Re: Memory leak?

Matt Neuburg

4 Apr 2014 4 Apr '14

4:45 p.m.

I think I've found a bug in TextMate's regular expression format string replacement feature. Try this.

Target document:

=== testing == testing = testing

Find expression (regex):

^(=+)

Replace expression:

${1/=(=)?(=)?/${2:?2:${1:?1:0}}/}

Do a replace all. What I expect:

2 testing 1 testing 0 testing

What I get:

2 testing 1 testing 1 testing

In the last line, neither group 2 nor group 1 should be matched, since the initial equal-sign is supposed to scarf up the entire match. Therefore I expect the logic to be:

* group 2 failed, so use its "else" alternative, which is the test for group 1

* group 1 failed, so use its "else" alternative, which is the value "0"

But try as I may, I cannot make "0" appear in the document. That is the proposed bug. It is as if group 1 is thought to be _always_ satisfied, which should not be the case.

Of course, feel free to prove me wrong by fixing my find/replace expressions, thus doing my homework for me. :) m.

-- matt neuburg, phd = http://www.apeth.net/matt/ pantes anthropoi tou eidenai oregontai phusei Programming iOS 7! http://shop.oreilly.com/product/0636920031017.do iOS 7 Fundamentals! http://shop.oreilly.com/product/0636920032465.do RubyFrontier! http://www.apeth.com/RubyFrontierDocs/default.html

Show replies by date

Allan Odgaard

4 Apr 4 Apr

4:58 p.m.

On 4 Apr 2014, at 21:45, Matt Neuburg wrote:

...

Find expression (regex):

^(=+)

Replace expression:

${1/=(=)?(=)?/${2:?2:${1:?1:0}}/}

Do a replace all […]

The problem is that your first search string ‘^(=+)’ populates capture register $1 and the nested replacements inherit variables from parent matches.

In your case, you can lose the parentheses. If you needed captures in the root search and conditionals in a nested replacement, a workaround would be to use named captures (to avoid clashes).

Matt Neuburg

6:31 p.m.

On Apr 4, 2014, at 7:58 AM, Allan Odgaard mailinglist@textmate.org wrote:

...

On 4 Apr 2014, at 21:45, Matt Neuburg wrote:

...
Find expression (regex):

^(=+)

Replace expression:

${1/=(=)?(=)?/${2:?2:${1:?1:0}}/}

Do a replace all […]

The problem is that your first search string ‘^(=+)’ populates capture register $1 and the nested replacements inherit variables from parent matches.

Thank you! Brilliantly explained. I understand completely (that's a bit scary, but never mind).

...

In your case, you can lose the parentheses. If you needed captures in the root search and conditionals in a nested replacement, a workaround would be to use named captures (to avoid clashes).

As you have guessed, this is a simplification of a larger problem. In real life, I can't lose the parentheses because I need the group to assign it a scope (in the AsciiDoc language grammar); my "find expression" is actually considerably longer. Unfortunately one can't mix named groups with implicitly numbered groups, so it's going to be named groups all the way. Just for the record, this solves it:

Find expression:

^(?<first>=+)

Replace expression:

${first/^=(?<two>=)?(?<three>=)?$/${three:?2:${two:?1:0}}/}

Of course, in real life, it's all going to be quite a bit longer, so now I have to go see if I can write it without losing my mind. :)

PS I based my expression on the parallel expression in the Markdown bundle - see repository > block > repository > heading. So now we know that the expression in the Markdown bundle is suffering from the same bug (in the expression, not in TextMate), but it does not surface because the default "1" is acceptable. In AsciiDoc, though, one equal sign is called a "level 0 heading".

Matt Neuburg

6:44 p.m.

On Apr 4, 2014, at 9:31 AM, Matt Neuburg matt@tidbits.com wrote:

...

Of course, in real life, it's all going to be quite a bit longer, so now I have to go see if I can write it without losing my mind. :)

Nailed it!

match = '^(?<eq>={1,5}) (?<title>\w.*)$\n?';

name = 'markup.heading.level.${eq/=(?<two>=)?(?<three>=)?(?<four>=)?(?<five>=)?/${five:?4:${four:?3:${three:?2:${two:?1:0}}}}/}.asciidoc'; Thanks again, Allan. m.

PS I think I need to lie down now.

Allan Odgaard

5 Apr 5 Apr

9:20 a.m.

On 4 Apr 2014, at 23:31, Matt Neuburg wrote:

...

[…] Unfortunately one can't mix named groups with implicitly numbered groups, so it's going to be named groups all the way […]

Just to be sure I understand, the issue here is that if you use named captures in the parent match, the $1-n variables are still created and inherited.

So there is no way around using named captures for conditionals in the format string, but in the parent rule, using named captures makes no difference.

I made the variables inherited for maximum flexibility, but given this issue, I have changed it so that $1-$n will not be inherited, where n == number of captures in the last regexp: https://github.com/textmate/textmate/commit/6185cc17ab59c81993983dffc9eb8229...

This means you can simplify the grammar after next build.

Matt Neuburg

8:10 p.m.

On Apr 5, 2014, at 12:20 AM, Allan Odgaard mailinglist@textmate.org wrote:

...

On 4 Apr 2014, at 23:31, Matt Neuburg wrote:

...
[…] Unfortunately one can't mix named groups with implicitly numbered groups, so it's going to be named groups all the way […]

Just to be sure I understand, the issue here is that if you use named captures in the parent match, the $1-n variables are still created and inherited.

I'm not saying you should change anything! The problem is solved, and one wouldn't want to risk breakage.

Just to summarize: my difficulty was in understanding how this line from the Markdown bundle grammar works:

name = 'markup.heading.${1/(#)(#)?(#)?(#)?(#)?(#)?/${6:?6:${5:?5:${4:?4:${3:?3:${2:?2:1}}}}}/}.markdown'; begin = '(?:^|\G)(#{1,6})\s*(?=[\S[^#]])'; end = '\s*(#{1,6})?$\n?';

Looking at the format string in the "name" entry, there is a clear implication that one can use group numbers in the second half to refer to groups from the regular expression in the first half. I mean, here's the structure of the thing:

${1/...search.../...replacement.../}

Now, I know what "1/" means at the start; it refers to the first group in the "begin" regex. (Doesn't it?) The question is: what do 6, 5, 4, 3, 2, and 1 in the replacement expression refer to (in the second half)? They seem to refer to groups in the search expression (the first half).

Now, I find that surprising. It isn't documented on this page: http://blog.macromates.com/2011/format-strings/ However, it clearly does work somehow, so, at the suggestion of Michael Sheets, I set out to imitate it in the AsciiDoc bundle. For AsciiDoc, the rules are different: we use "=" instead of "#", and if there is one "=" that is a level 0 heading and so on. Well, I couldn't get it to work. That's when I started experimenting with simple find/replace expressions in a document (using the Find/Replace dialog), and I found the results incoherent.

Let's take this example. My document is:

==== hello ====

Find expression:

(hello)

Replace expression:

${1/(hello)?/${1:?howdy:scram}/}

Now do a Replace All. Result: the document says "howdy", which is what I expect. The challenge is to get the word "scram" to appear in the document!

Revert to the original document ("hello"). Let's suppose I change the replace expression to this:

${1/(hey)?/${1:?howdy:scram}/}

Now when I Replace All, I get "howdyhello". I found that mystifying, since "(hey)" is not being found. My replacement choice is between "howdy" and "scram", so I expect to see "scram".

That's when I posed my question and learned that "1" is ambiguous. It might refer to the search expression in the first half "(hey)", but it might also refer to the original Find expression "(hello)". So (since I needed the parentheses for other reasons) I tried to resolve that ambiguity by using names instead of numbers:

Find expression:

(?<orig>hello)

Replace expression:

${orig/(hey)?/${1:?howdy:scram}/}

That didn't solve it. So I used names _everywhere_:

Find expression:

(?<orig>hello)

Replace expression:

${orig/(?<one>hey)?/${one:?howdy:scram}/}

And now, at last, "scram" appears in my document (I get "scramhello"). So that's why I said it would have to be names throughout. I see nothing wrong with that. I like names!

One last note: I am still mystified by what happens if we _remove_ the question mark after the parentheses in the first half of the replace expression:

${1/(hey)/${1:?howdy:scram}/}

In that case, I can in no circumstances get "scram" to appear. I don't understand why not. The search for "(hey)" has failed, but $1 still has meaning, so I expect to see either "howdy" or "scram". I see _neither_.

Allan Odgaard

6 Apr 6 Apr

4:06 a.m.

On 6 Apr 2014, at 1:10, Matt Neuburg wrote:

...

[…] I'm not saying you should change anything! The problem is solved, and one wouldn't want to risk breakage.

The change I did should not cause breakage.

Variables are still inherited, but if a deeper regexp captures/creates $1-$n then these variables are excluded from potentially being inherited. In most cases, the deeper regexp will overwrite the values of these variables, resulting in the same effect, but when using conditional captures (like your example) there is a possibility for the deeper regexp to not overwrite the variable, which is the case in which we want to disable inheriting the parent’s value.

...

[…] One last note: I am still mystified by what happens if we _remove_ the question mark after the parentheses in the first half of the replace expression:

${1/(hey)/${1:?howdy:scram}/}

In that case, I can in no circumstances get "scram" to appear. I don't understand why not. The search for "(hey)" has failed, but $1 still has meaning, so I expect to see either "howdy" or "scram". I see _neither_.

The regexp /(hey)/ requires matching “hey” in the text. If “hey” is not found in the text, it will not match, and no replacement will be done (so TextMate never looks at the format string you provided for this replacement).

Contrast that to /(hey)?/. By adding ‘?’ after ‘(hey)’ we make that part of the regexp optional. This means the regexp will try to match “hey” (and put that in $1), but if there is no “hey” in the text, the regexp will still match (as we made this part optional), though nothing is captured in $1.

Matt Neuburg

7 Apr 7 Apr

8:34 p.m.

On Apr 5, 2014, at 7:06 PM, Allan Odgaard mailinglist@textmate.org wrote:

...

Contrast that to /(hey)?/. By adding ‘?’ after ‘(hey)’ we make that part of the regexp optional. This means the regexp will try to match “hey” (and put that in $1), but if there is no “hey” in the text, the regexp will still match (as we made this part optional), though nothing is captured in $1.

I have to fight my intuitions here. What I intuitively expect is that the Find/Replace dialog is a contract. If I say:

Find: (hello)

Replace: [...whatever...]

Then, since "hello" _is_ found, I expect that it will be replaced, in its entirety, by _something_: we DID find it so we WILL replace it. Thus, for example, I am surprised when this:

Find expression:

(?<orig>hello)

Replace expression:

${orig/(?<one>hey)?/${one:?howdy:scram}/}

...yields "scramhello" in the document. The calculus in the replace expression has settled on the word "scram", which is what I was trying to achieve, but since what was originally found is "hello", I expect _all_ of "hello" to be replaced by "scram". In other words, I expect the replace expression to be a calculation about what to put in place of the whole text found by the Find expression. m.

Allan Odgaard

8 Apr 8 Apr

4:59 a.m.

On 8 Apr 2014, at 1:34, Matt Neuburg wrote:

...

Find expression:

(?<orig>hello)

Replace expression:

${orig/(?<one>hey)?/${one:?howdy:scram}/}

[…] I expect _all_ of "hello" to be replaced by "scram". In other words, I expect the replace expression to be a calculation about what to put in place of the whole text found by the Find expression.

The second assertion here is correct. You ask TextMate to find “hello” and replace it.

But your replacement text is effectively “$orig”, i.e. the original string found.

What you do is a substitution on the “$orig” variable, but this substitution is “replace ‘hey’ with ‘howdy’, and if you do not find ‘hey’, insert ‘scram’”.

The problem is “not finding hey” expressed using “(hey)?” means that zero bytes is considered “not finding ‘hey’”.

What you probably want is (I switched back to numbered captures for readability):

${orig/(hey)|.+/${1:?howdy:scram}/}

What this regular expression says (‘(hey)|.+’) is “find ‘hey’ and put in capture $1 OR (‘|’) find anything (‘.+’)”.

So when it gets time to evalulate what to insert for what we found (‘${1:?howdy:scram}’) then we either matched ‘hey’ and will repalce that with ‘howdy’ or we replaced non-zero bytes of something else, and will replace that with ‘scram’.

Matt Neuburg

6:22 a.m.

On Apr 7, 2014, at 7:59 PM, Allan Odgaard mailinglist@textmate.org wrote:

...

But your replacement text is effectively “$orig”, i.e. the original string found.

Yes, I see. It is *I* who am supplying "scramhello" as the replacement text. Thanks! m.

4132

days inactive

4136

days old

textmate@lists.macromates.com

9 comments

participants

tags (0)

participants (2)

Allan Odgaard
Matt Neuburg