[TxMt] Re: textmate Digest, Vol 14, Issue 2

Michael Newton miken32 at gmail.com
Sun Jul 5 03:57:57 UTC 2009


On Sat, Jul 4, 2009 at 6:00 AM, <textmate-request at lists.macromates.com> wrote:
> ---------- Forwarded message ----------
> From: Scott Haneda <talklists at newgeo.com>
> To: TextMate users <textmate at lists.macromates.com>
> Date: Fri, 3 Jul 2009 12:40:55 -0700
> Subject: [TxMt] Re: The dreaded Regexp question
> On Jul 3, 2009, at 10:00 AM, Michael Newton wrote:
>
>> Sorry, I know this isn't particularly on-topic (aside from the fact
>> that I'm using Textmate!) but I'm not having luck with the search
>> engines.
>>
>> I have a bunch of HTML that needs to be converted to XHTML, notably
>> <input type="text"> needs to be <input type="text"/> which is easy
>> enough. Problem is, it's PHP so there are things like <input
>> type="<?php echo $type?>"> which I'm having troubles with. So how can
>> I create a regular expression that captures the guts of the HTML
>> brackets, while ignoring any PHP brackets it might come across inside
>> the HTML?
>
>
> I used this web tool to help me:
> http://www.gskinner.com/RegExr/
>
> I did my best to put in single tics, quote marks etc:
> <input type="<?php echo $type?>"> some type and then another input <input type="<?php echo $type?>" name='value' class="foo">
> <input type="some_value">
> <input type="$some_$value">
> <hr>
> <br>
>
> My regex pattern was:
> (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>)
>
> My replace pattern was:
> $1$2$3/>
> * You could do less pattern grouping, I did so as I was working through it.
>
> Result was:
> <input type="<?php echo $type?>" type="<?php echo $type?>"/> some type and then another input <input type="<?php echo $type?>" name='value' class="foo" class="foo"/>
> <input type="some_value" type="some_value"/>
> <input type="$some_$value" type="$some_$value"/>
> <hr/>
> <br/>
>
> The one issue is it will alter plain closing tags, like </a> will become </a/> and I could not wokr that out.  Either you can solve that in the regex by ignoring anything with a "/" in it already, or, I may be inclined to cheat.  With the recording ability of textmate, I would try something like:
> find "/>"
> replace "#tmp#
> find (</?\w+)((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?(>)
> replace $1$2$3/>
> find "#tmp#
> replace "/>"
>
> It should happen pretty quick.
> --
> Scott * If you contact me off list replace talklists@ with scott@ *
>

Thanks, I actually figured it out just now as I was composing a reply.
Negative lookbehind assertion only matches ">" if it's not preceded by
"?" or "/":
find: <((?:input|img|link|meta|hr|br|area).*?)(?<![?/])>
replace: <$1/>

turns this:
<input name="foo"<?php echo $bar?>>
<input name="foo" value="<?php echo $foo?>">
<input name="foo" <?php echo $bar?>/>
<input name="foo" value="<?php echo $foo?>"/>
<a href="bar">baz</a>


into this:
<input name="foo"<?php echo $bar?>/>
<input name="foo" value="<?php echo $foo?>"/>
<input name="foo" <?php echo $bar?>/>
<input name="foo" value="<?php echo $foo?>"/>
<a href="bar">baz</a>

Just need to see if it works in TM when I get back to my office (and
my Mac!) Definitely bookmarking that site though, and will look more
into this "recording ability."

-- 
Michael Newton
http://mike.eire.ca/



More information about the textmate mailing list