[TxMt] [ANN] Select Balanced HTML Tag!!!1!
Thomas Aylott - subtleGradient
textmate at subtleGradient.com
Fri Nov 16 16:02:12 UTC 2007
On Nov 16, 2007, at 8:00 AM, Hans-Jörg Bibiko wrote:
> On 16.11.2007, at 13:29, Thomas Aylott - subtleGradient wrote:
>> This runs into the problem I'd been having for 3 years.
>> How do you get it to work when you have a tag nested inside the
>> same kind of tag?
>> Keeping it from matching the first close tag it finds, or the very
>> last one.
>> <div>
>> <div>
>> <div>
>> TEXT
>> </div>
>> </div>
>> </div>
>
> Of course, you're right. That is THE problem! And I also have no
> solution for it by using regexp.
>
> One way I have in my mind is to write a character by character
> parser. If one has found the closing tag (e.g. 'p') it should be
> possible to go from the caret's position step by step to the right
> side to look for '</p>'. If one finds '<p...>' while doing this a
> counter would be set counter+1; if one finds '</p>' the counter
> would be set to counter-1; then if counter < 0 I found my closing
> tag (meaning index). As next the same from the caret's position to
> left side. If one writes this in perl/ruby/... and the entire text
> is stored as character array I can splice the array and finally I
> have the desired string. With that string I can execute a normal
> findNext and findPrevios macro.
>
> I don't know whether it works but ...
> Maybe I find some time to try it out. The advantage would be that I
> don't have to parse the entire document.
> Or one would write it in Objective-C as plug-in, or Allan has a nice
> idea for it ;)
>
> On the other hand I thought about to use an external HTML parser.
> This works but the parser is also very slow if one has a large HTML
> file. One could think about to restrict the area - 100 line above
> and below the current line - for parsing but this is also tricky.
>
>
> Cheers,
>
> --Hans
One idea is to remove the problem of all the nested identical tags by
using 1 pass to make all tagnames unique.
Something like what you said with a counter that goes up and down as
it hits a duplicate tagname:
<div1>
<div2>
<div3>
TEXT
</div3>
</div2>
</div1>
Then you could do a simpler regex to find the balance of the tags.
Then it's just a matter of wrapping the selection with something
unique...
Fixing the document again...
And then finding your selection again...
And then removing that unique wrapper.
We'd have to come up with a nice way to limit the scope initially so
you don't have to parse the whole document every time.
I'm sure there's a simple way to do it that we're just not seeing.
—Thomas Aylott – subtleGradient—
More information about the textmate
mailing list