Hi,
I'm just playing with Oniguruma's nested levels etc. In principal it works brilliant, BUT it could happen that the regexp engine runs in an eternal loop with no chance to cancel that loop.
Example: <html lang="en"> <body> <div> <div id="2"> <p>blah</p> </div> </div> </body> </html>
regexp: (one line!) (?-i)(?<element>\g<stag>\g<content>*\g<etag>){0}(?<stag><\g<name> \s*[^>]*>){0}(?<name>[a-zA-Z_:]+){0}(?<content>[^<&]+(\g<element>|[^<&] +)*){0}(?<etag></\k<name+1>>){0}\g<element>
Place the caret within the p tag and find previous match (SHIFT+APPLE +G) and repeat it. Fine this works. The same also works for find next match.
But the regexp CANNOT handle up to now tags like <img src="foo">.
So if you have e.g.:
<html lang="en"> <body> <div> <img src="foo"> <div id="2"> <p>blah</p> </div> </div> </body> </html>
[ATTENTION TextMate will freeze!!!!] place the caret inside of the p tag and press three time SHIFT+APPLE+G and TM freezes. You only can "force quit". Attached is a part of the error report.
Of course, this is a problem of the regexp but if one wants to develop such a regexp such errors could appear. The question now is how to interrupt such an eternal regexp loop?? To be honest I have no idea but maybe one could listen to the keyboard event (APPLE+.) inside of the regexp function.
Cheers,
--Hans
Error report: 223 __Z18view_find_previousPN4text4viewE + 1075 (in TextMate) [0x9c719] 223 __ZN9oniguruma4findEPtiiPKNS_6ptrn_tEii + 115 (in TextMate) [0xf6639] 11 _onig_search + 414 (in TextMate) [0x161bba] 1 _onig_is_in_code_range + 67 (in TextMate) [0x15c2c1] 1 _utf16le_mbc_enc_len + 22 (in TextMate) [0x16361a] 1 _utf16le_mbc_enc_len + 11 (in TextMate) [0x16360f] 1 _onig_is_in_code_range + 29 (in TextMate) [0x15c29b] 1 _onig_is_in_code_range + 31 (in TextMate) [0x15c29d] 1 _onig_is_in_code_range + 88 (in TextMate) [0x15c2d6] 1 _onig_is_in_code_range + 11 (in TextMate) [0x15c289] 1 _onig_is_in_code_range + 4 (in TextMate) [0x15c282] 1 _mem_is_in_memp + 4 (in TextMate) [0x15c249] 1 _mem_is_in_memp + 5 (in TextMate) [0x15c24a] 1 _utf16le_mbc_to_code + 17 (in TextMate) [0x163651] 8 _match_at + 3858 (in TextMate) [0x15d291] 7 _match_at + 10242 (in TextMate) [0x15eb81] 7 _match_at + 8800 (in TextMate) [0x15e5df] 7 _match_at + 3848 (in TextMate) [0x15d287] 7 _match_at + 3814 (in TextMate) [0x15d265] 6 _match_at + 8811 (in TextMate) [0x15e5ea] 6 _match_at + 339 (in TextMate) [0x15c4d2] 6 _match_at + 3842 (in TextMate) [0x15d281] 6 _match_at + 10259 (in TextMate) [0x15eb92] 5 _match_at + 8784 (in TextMate) [0x15e5cf] 5 _match_at + 6579 (in TextMate) [0x15dd32] 4 _match_at + 2121 (in TextMate) [0x15cbc8] 4 _match_at + 3834 (in TextMate) [0x15d279] 4 _match_at + 10216 (in TextMate) [0x15eb67] 4 _match_at + 6626 (in TextMate) [0x15dd61] 4 _match_at + 3825 (in TextMate) [0x15d270] 4 _match_at + 6599 (in TextMate) [0x15dd46] 3 _match_at + 341 (in TextMate) [0x15c4d4] 3 _match_at + 8795 (in TextMate) [0x15e5da] 3 _match_at + 11214 (in TextMate) [0x15ef4d] 3 _match_at + 3387 (in TextMate) [0x15d0ba] 3 _match_at + 8790 (in TextMate) [0x15e5d5] 3 _match_at + 8806 (in TextMate) [0x15e5e5] 3 _match_at + 3434 (in TextMate) [0x15d0e9] 3 _match_at + 2195 (in TextMate) [0x15cc12] 3 _match_at + 6614 (in TextMate) [0x15dd55] 2 _match_at + 11317 (in TextMate) [0x15efb4] 2 _match_at + 3816 (in TextMate) [0x15d267] 2 _match_at + 6593 (in TextMate) [0x15dd40] 2 _match_at + 299 (in TextMate) [0x15c4aa] 2 _match_at + 6590 (in TextMate) [0x15dd3d] 2 _match_at + 2108 (in TextMate) [0x15cbbb] 2 _match_at + 1446 (in TextMate) [0x15c925] 2 _match_at + 10232 (in TextMate) [0x15eb77] 2 _match_at + 3811 (in TextMate) [0x15d262] 2 _match_at + 6616 (in TextMate) [0x15dd57] 2 _match_at + 2198 (in TextMate) [0x15cc15] 2 _match_at + 10238 (in TextMate) [0x15eb7d] 2 _match_at + 10227 (in TextMate) [0x15eb72] 2 _match_at + 8798 (in TextMate) [0x15e5dd] 2 _match_at + 10143 (in TextMate) [0x15eb1e] 2 _match_at + 2084 (in TextMate) [0x15cba3] 1 _match_at + 16465 (in TextMate) [0x1603d0] 1 _match_at + 3408 (in TextMate) [0x15d0cf] 1 _match_at + 11320 (in TextMate) [0x15efb7] 1 _match_at + 6548 (in TextMate) [0x15dd13] 1 _match_at + 2146 (in TextMate) [0x15cbe1]
On 4 Sep 2008, at 10:44, Hans-Jörg Bibiko wrote:
[...] [ATTENTION TextMate will freeze!!!!]
[...] Of course, this is a problem of the regexp but if one wants to develop such a regexp such errors could appear. The question now is how to interrupt such an eternal regexp loop?? To be honest I have no idea but maybe one could listen to the keyboard event (APPLE+.) inside of the regexp function.
It’s easier said than done (allowing graceful aborting of exponential time regexps) but it is something which is on the radar.
On 07.09.2008, at 15:06, Allan Odgaard wrote:
On 4 Sep 2008, at 10:44, Hans-Jörg Bibiko wrote:
[...] [ATTENTION TextMate will freeze!!!!]
[...] Of course, this is a problem of the regexp but if one wants to develop such a regexp such errors could appear. The question now is how to interrupt such an eternal regexp loop?? To be honest I have no idea but maybe one could listen to the keyboard event (APPLE+.) inside of the regexp function.
It’s easier said than done (allowing graceful aborting of exponential time regexps)
Unfortunately I know ;)
but it is something which is on the radar.
Thanks!
--Hans
I know that Python can tell if a loop is infinite, although I'm not sure if that behavior applies to Regex (and I can't test it -- I couldn't write an infinite search if required...but if you provide one, I will). What I mean is that there may be something of interest in their source.
Thomas Allen
2008/9/8 Hans-Jörg Bibiko bibiko@eva.mpg.de
On 07.09.2008, at 15:06, Allan Odgaard wrote:
On 4 Sep 2008, at 10:44, Hans-Jörg Bibiko wrote:
[...] [ATTENTION TextMate will freeze!!!!]
[...] Of course, this is a problem of the regexp but if one wants to develop such a regexp such errors could appear. The question now is how to interrupt such an eternal regexp loop?? To be honest I have no idea but maybe one could listen to the keyboard event (APPLE+.) inside of the regexp function.
It's easier said than done (allowing graceful aborting of exponential time regexps)
Unfortunately I know ;)
but it is something which is on the radar.
Thanks!
--Hans
textmate mailing list textmate@lists.macromates.com http://lists.macromates.com/listinfo/textmate
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 8 Sep 2008, at 2:23 pm, Thomas Allen wrote:
I know that Python can tell if a loop is infinite, although I'm not sure if that behavior applies to Regex (and I can't test it -- I couldn't write an infinite search if required...but if you provide one, I will). What I mean is that there may be something of interest in their source.
It's not infinity that's necessarily the problem; the problem is that it's possible to write a regexp that takes millions of years to decide. Which isn't infinite, but is a long time to wait for dinner.
And you can't reliably tell if a loop is infinite. Any automated test, except in very restricted languages (non Turing-complete ones), can only classify loops (or any other bit of code into) these three categories:
- Will loop forever - Won't loop forever (but might loop for a billion years) - Might loop forever; who knows?
...and most samples will fall into the latter category, which tells you no more than you already knew ;-)
What makes you say that Python can tell if a loop is infinite, out of interest? I'm sure it must be a misinterpretation of something, unless the halting problem has been solved while I wasn't looking...
http://en.wikipedia.org/wiki/Halting_problem
"In computability theory, the halting problem is a decision problem which can be stated as follows: given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.
Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist."
ABS
- -- Alaric Snell-Pym Work: http://www.snell-systems.co.uk/ Play: http://www.snell-pym.org.uk/alaric/ Blog: http://www.snell-pym.org.uk/?author=4
Wow, very informative. I read somewhere that Python prevented infinite loops, but performing this in IDLE on my office Windows box proved me wrong:
while 1: print "Hi"
Thomas Allen
2008/9/8 Alaric Snell-Pym alaric@snell-pym.org.uk
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 8 Sep 2008, at 2:23 pm, Thomas Allen wrote:
I know that Python can tell if a loop is infinite, although I'm not sure if that behavior applies to Regex (and I can't test it -- I couldn't write an infinite search if required...but if you provide one, I will). What I mean is that there may be something of interest in their source.
It's not infinity that's necessarily the problem; the problem is that it's possible to write a regexp that takes millions of years to decide. Which isn't infinite, but is a long time to wait for dinner.
And you can't reliably tell if a loop is infinite. Any automated test, except in very restricted languages (non Turing-complete ones), can only classify loops (or any other bit of code into) these three categories:
- Will loop forever
- Won't loop forever (but might loop for a billion years)
- Might loop forever; who knows?
...and most samples will fall into the latter category, which tells you no more than you already knew ;-)
What makes you say that Python can tell if a loop is infinite, out of interest? I'm sure it must be a misinterpretation of something, unless the halting problem has been solved while I wasn't looking...
http://en.wikipedia.org/wiki/Halting_problem
"In computability theory, the halting problem is a decision problem which can be stated as follows: given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.
Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist."
ABS
Alaric Snell-Pym Work: http://www.snell-systems.co.uk/ Play: http://www.snell-pym.org.uk/alaric/ Blog: http://www.snell-pym.org.uk/?author=4
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkjFMq4ACgkQRgz/WHNxCGqy4wCZAdUt2dQXBJrvWZerZ4cdO520 XDIAnjWhygjTS5QnvNQ8M5aOuvoFq+Fm =hObQ -----END PGP SIGNATURE-----
textmate mailing list textmate@lists.macromates.com http://lists.macromates.com/listinfo/textmate
On 08.09.2008, at 16:11, Alaric Snell-Pym wrote:
It's not infinity that's necessarily the problem; the problem is that it's possible to write a regexp that takes millions of years to decide. Which isn't infinite, but is a long time to wait for dinner.
And you can't reliably tell if a loop is infinite. Any automated test, except in very restricted languages (non Turing-complete ones), can only classify loops (or any other bit of code into) these three categories:
- Will loop forever
- Won't loop forever (but might loop for a billion years)
- Might loop forever; who knows?
...and most samples will fall into the latter category, which tells you no more than you already knew ;-)
"In computability theory, the halting problem is a decision problem which can be stated as follows: given a description of a program and a finite input, decide whether the program finishes running or will run forever, given that input.
Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist."
Furthermore, due to the fact that regular expressions are a kind of a formal language (Chomsky Type 3) you may not forget "Gödel's incompleteness theorems" (esp. the second theoreme). http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems
That's why one could think more practically ;) Maybe it is possible to run the matching process in two threads. One thread (a) is used to match/replace/... and the other one (b) listens to a given keyboard event to kill the thread (a) if that process takes more time as expected. The 'only' thing which could be tricky is to remember the status of the text etc. before the regexp engine started.
Cheers,
--Hans