On Sat, Jul 4, 2009 at 6:00 AM, textmate-request@lists.macromates.com wrote:
---------- Forwarded message ---------- From: Scott Haneda talklists@newgeo.com To: TextMate users textmate@lists.macromates.com Date: Fri, 3 Jul 2009 12:40:55 -0700 Subject: [TxMt] Re: The dreaded Regexp question On Jul 3, 2009, at 10:00 AM, Michael Newton wrote:
Sorry, I know this isn't particularly on-topic (aside from the fact that I'm using Textmate!) but I'm not having luck with the search engines.
I have a bunch of HTML that needs to be converted to XHTML, notably <input type="text"> needs to be <input type="text"/> which is easy enough. Problem is, it's PHP so there are things like <input type="<?php echo $type?>"> which I'm having troubles with. So how can I create a regular expression that captures the guts of the HTML brackets, while ignoring any PHP brackets it might come across inside the HTML?
I used this web tool to help me: http://www.gskinner.com/RegExr/
I did my best to put in single tics, quote marks etc: <input type="<?php echo $type?>"> some type and then another input <input type="<?php echo $type?>" name='value' class="foo">
<input type="some_value"> <input type="$some_$value"> <hr> <br>
My regex pattern was: (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>)
My replace pattern was: $1$2$3/>
- You could do less pattern grouping, I did so as I was working through it.
Result was: <input type="<?php echo $type?>" type="<?php echo $type?>"/> some type and then another input <input type="<?php echo $type?>" name='value' class="foo" class="foo"/>
<input type="some_value" type="some_value"/> <input type="$some_$value" type="$some_$value"/> <hr/> <br/>
The one issue is it will alter plain closing tags, like </a> will become </a/> and I could not wokr that out. Either you can solve that in the regex by ignoring anything with a "/" in it already, or, I may be inclined to cheat. With the recording ability of textmate, I would try something like: find "/>" replace "#tmp# find (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>) replace $1$2$3/> find "#tmp# replace "/>"
It should happen pretty quick.
Scott * If you contact me off list replace talklists@ with scott@ *
Thanks, I actually figured it out just now as I was composing a reply. Negative lookbehind assertion only matches ">" if it's not preceded by "?" or "/": find: <((?:input|img|link|meta|hr|br|area).*?)(?<![?/])> replace: <$1/>
turns this: <input name="foo"<?php echo $bar?>> <input name="foo" value="<?php echo $foo?>"> <input name="foo" <?php echo $bar?>/> <input name="foo" value="<?php echo $foo?>"/> <a href="bar">baz</a>
into this: <input name="foo"<?php echo $bar?>/> <input name="foo" value="<?php echo $foo?>"/> <input name="foo" <?php echo $bar?>/> <input name="foo" value="<?php echo $foo?>"/> <a href="bar">baz</a>
Just need to see if it works in TM when I get back to my office (and my Mac!) Definitely bookmarking that site though, and will look more into this "recording ability."