Sorry, I know this isn't particularly on-topic (aside from the fact that I'm using Textmate!) but I'm not having luck with the search engines.
I have a bunch of HTML that needs to be converted to XHTML, notably <input type="text"> needs to be <input type="text"/> which is easy enough. Problem is, it's PHP so there are things like <input type="<?php echo $type?>"> which I'm having troubles with. So how can I create a regular expression that captures the guts of the HTML brackets, while ignoring any PHP brackets it might come across inside the HTML?
Thanks in advance.
On Jul 3, 2009, at 10:00 AM, Michael Newton wrote:
Sorry, I know this isn't particularly on-topic (aside from the fact that I'm using Textmate!) but I'm not having luck with the search engines.
I have a bunch of HTML that needs to be converted to XHTML, notably <input type="text"> needs to be <input type="text"/> which is easy enough. Problem is, it's PHP so there are things like <input type="<?php echo $type?>"> which I'm having troubles with. So how can I create a regular expression that captures the guts of the HTML brackets, while ignoring any PHP brackets it might come across inside the HTML?
I used this web tool to help me: http://www.gskinner.com/RegExr/
I did my best to put in single tics, quote marks etc: <input type="<?php echo $type?>"> some type and then another input <input type="<?php echo $type?>" name='value' class="foo"> <input type="some_value"> <input type="$some_$value"> <hr> <br>
My regex pattern was: (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>)
My replace pattern was: $1$2$3/> * You could do less pattern grouping, I did so as I was working through it.
Result was: <input type="<?php echo $type?>" type="<?php echo $type?>"/> some type and then another input <input type="<?php echo $type?>" name='value' class="foo" class="foo"/> <input type="some_value" type="some_value"/> <input type="$some_$value" type="$some_$value"/> <hr/> <br/>
The one issue is it will alter plain closing tags, like </a> will become </a/> and I could not wokr that out. Either you can solve that in the regex by ignoring anything with a "/" in it already, or, I may be inclined to cheat. With the recording ability of textmate, I would try something like: find "/>" replace "#tmp# find (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>) replace $1$2$3/> find "#tmp# replace "/>"
It should happen pretty quick.
On Jul 3, 2009, at 10:00 AM, Michael Newton wrote:
Sorry, I know this isn't particularly on-topic (aside from the fact that I'm using Textmate!) but I'm not having luck with the search engines.
I have a bunch of HTML that needs to be converted to XHTML, notably <input type="text"> needs to be <input type="text"/> which is easy enough. Problem is, it's PHP so there are things like <input type="<?php echo $type?>"> which I'm having troubles with. So how can I create a regular expression that captures the guts of the HTML brackets, while ignoring any PHP brackets it might come across inside the HTML?
With that particular example, just passing the HTML thru HTML Tidy's HTML -> XHTML conversion does the right thing. m.