[TxMt] Re: The dreaded Regexp question

Scott Haneda talklists at newgeo.com
Fri Jul 3 19:40:55 UTC 2009

On Jul 3, 2009, at 10:00 AM, Michael Newton wrote:

> Sorry, I know this isn't particularly on-topic (aside from the fact
> that I'm using Textmate!) but I'm not having luck with the search
> engines.
> I have a bunch of HTML that needs to be converted to XHTML, notably
> <input type="text"> needs to be <input type="text"/> which is easy
> enough. Problem is, it's PHP so there are things like <input
> type="<?php echo $type?>"> which I'm having troubles with. So how can
> I create a regular expression that captures the guts of the HTML
> brackets, while ignoring any PHP brackets it might come across inside
> the HTML?

I used this web tool to help me:

I did my best to put in single tics, quote marks etc:
<input type="<?php echo $type?>"> some type and then another input  
<input type="<?php echo $type?>" name='value' class="foo">
<input type="some_value">
<input type="$some_$value">

My regex pattern was:

My replace pattern was:
* You could do less pattern grouping, I did so as I was working  
through it.

Result was:
<input type="<?php echo $type?>" type="<?php echo $type?>"/> some type  
and then another input <input type="<?php echo $type?>" name='value'  
class="foo" class="foo"/>
<input type="some_value" type="some_value"/>
<input type="$some_$value" type="$some_$value"/>

The one issue is it will alter plain closing tags, like </a> will  
become </a/> and I could not wokr that out.  Either you can solve that  
in the regex by ignoring anything with a "/" in it already, or, I may  
be inclined to cheat.  With the recording ability of textmate, I would  
try something like:
find "/>"
replace "#tmp#
find (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>)
replace $1$2$3/>
find "#tmp#
replace "/>"

It should happen pretty quick.
Scott * If you contact me off list replace talklists@ with scott@ *

More information about the textmate mailing list