[TxMt] [ANN] Select Balanced HTML Tag!!!1!

Thomas Aylott - subtleGradient textmate at subtleGradient.com
Fri Nov 16 16:02:12 UTC 2007


On Nov 16, 2007, at 8:00 AM, Hans-Jörg Bibiko wrote:

> On 16.11.2007, at 13:29, Thomas Aylott - subtleGradient wrote:
>> This runs into the problem I'd been having for 3 years.
>> How do you get it to work when you have a tag nested inside the  
>> same kind of tag?
>> Keeping it from matching the first close tag it finds, or the very  
>> last one.
>> <div>
>> 	<div>
>> 		<div>
>> 			TEXT
>> 		</div>
>> 	</div>
>> </div>
>
> Of course, you're right. That is THE problem! And I also have no  
> solution for it by using regexp.
>
> One way I have in my mind is to write a character by character  
> parser. If one has found the closing tag (e.g. 'p') it should be  
> possible to go from the caret's position step by step to the right  
> side to look for '</p>'. If one finds '<p...>' while doing this a  
> counter would be set counter+1; if one finds '</p>' the counter  
> would be set to counter-1; then if counter < 0 I found my closing  
> tag (meaning index). As next the same from the caret's position to  
> left side. If one writes this in perl/ruby/... and the entire text  
> is stored as character array I can splice the array and finally I  
> have the desired string. With that string I can execute a normal  
> findNext and findPrevios macro.
>
> I don't know whether it works but ...
> Maybe I find some time to try it out. The advantage would be that I  
> don't have to parse the entire document.
> Or one would write it in Objective-C as plug-in, or Allan has a nice  
> idea for it ;)
>
> On the other hand I thought about to use an external HTML parser.  
> This works but the parser is also very slow if one has a large HTML  
> file. One could think about to restrict the area - 100 line above  
> and below the current line - for parsing but this is also tricky.
>
>
> Cheers,
>
> --Hans


One idea is to remove the problem of all the nested identical tags by  
using 1 pass to make all tagnames unique.
Something like what you said with a counter that goes up and down as  
it hits a duplicate tagname:

<div1>
	<div2>
		<div3>
			TEXT
		</div3>
	</div2>
</div1>

Then you could do a simpler regex to find the balance of the tags.

Then it's just a matter of wrapping the selection with something  
unique...
Fixing the document again...
And then finding your selection again...
And then removing that unique wrapper.

We'd have to come up with a nice way to limit the scope initially so  
you don't have to parse the whole document every time.

I'm sure there's a simple way to do it that we're just not seeing.

—Thomas Aylott – subtleGradient—




More information about the textmate mailing list