Re: [TxMt] [ANN] Select Balanced HTML Tag!!!1!

16 Nov 2007


      On Nov 16, 2007, at 8:00 AM, Hans-Jörg Bibiko wrote:
...
On 16.11.2007, at 13:29, Thomas Aylott - subtleGradient wrote:
...
This runs into the problem I'd been having for 3 years.
How do you get it to work when you have a tag nested inside the  
same kind of tag?
Keeping it from matching the first close tag it finds, or the very  
last one.
<div>
   <div>
   	<div>
   		TEXT
   	</div>
   </div>
</div>
Of course, you're right. That is THE problem! And I also have no  
solution for it by using regexp.
One way I have in my mind is to write a character by character  
parser. If one has found the closing tag (e.g. 'p') it should be  
possible to go from the caret's position step by step to the right  
side to look for '</p>'. If one finds '<p...>' while doing this a  
counter would be set counter+1; if one finds '</p>' the counter  
would be set to counter-1; then if counter < 0 I found my closing  
tag (meaning index). As next the same from the caret's position to  
left side. If one writes this in perl/ruby/... and the entire text  
is stored as character array I can splice the array and finally I  
have the desired string. With that string I can execute a normal  
findNext and findPrevios macro.
I don't know whether it works but ...
Maybe I find some time to try it out. The advantage would be that I  
don't have to parse the entire document.
Or one would write it in Objective-C as plug-in, or Allan has a nice  
idea for it ;)
On the other hand I thought about to use an external HTML parser.  
This works but the parser is also very slow if one has a large HTML  
file. One could think about to restrict the area - 100 line above  
and below the current line - for parsing but this is also tricky.
Cheers,
--Hans
One idea is to remove the problem of all the nested identical tags by  
using 1 pass to make all tagnames unique.
Something like what you said with a counter that goes up and down as  
it hits a duplicate tagname:
<div1>
    <div2>
    	<div3>
    		TEXT
    	</div3>
    </div2>
</div1>
Then you could do a simpler regex to find the balance of the tags.
Then it's just a matter of wrapping the selection with something  
unique...
Fixing the document again...
And then finding your selection again...
And then removing that unique wrapper.
We'd have to come up with a nice way to limit the scope initially so  
you don't have to parse the whole document every time.
I'm sure there's a simple way to do it that we're just not seeing.
—Thomas Aylott – subtleGradient—

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [TxMt] [ANN] Select Balanced HTML Tag!!!1!