On Jul 3, 2009, at 10:00 AM, Michael Newton wrote:
Sorry, I know this isn't particularly on-topic (aside from the fact that I'm using Textmate!) but I'm not having luck with the search engines.
I have a bunch of HTML that needs to be converted to XHTML, notably <input type="text"> needs to be <input type="text"/> which is easy enough. Problem is, it's PHP so there are things like <input type="<?php echo $type?>"> which I'm having troubles with. So how can I create a regular expression that captures the guts of the HTML brackets, while ignoring any PHP brackets it might come across inside the HTML?
I used this web tool to help me: http://www.gskinner.com/RegExr/
I did my best to put in single tics, quote marks etc: <input type="<?php echo $type?>"> some type and then another input <input type="<?php echo $type?>" name='value' class="foo"> <input type="some_value"> <input type="$some_$value"> <hr> <br>
My regex pattern was: (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>)
My replace pattern was: $1$2$3/> * You could do less pattern grouping, I did so as I was working through it.
Result was: <input type="<?php echo $type?>" type="<?php echo $type?>"/> some type and then another input <input type="<?php echo $type?>" name='value' class="foo" class="foo"/> <input type="some_value" type="some_value"/> <input type="$some_$value" type="$some_$value"/> <hr/> <br/>
The one issue is it will alter plain closing tags, like </a> will become </a/> and I could not wokr that out. Either you can solve that in the regex by ignoring anything with a "/" in it already, or, I may be inclined to cheat. With the recording ability of textmate, I would try something like: find "/>" replace "#tmp# find (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>) replace $1$2$3/> find "#tmp# replace "/>"
It should happen pretty quick.