Hi,
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:
I have this list of fixed classes:
NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder
and the generator script will output something like this:
NS(Array|Mutable(String|Array)|C(ell(Item)?|oder))
If someone knows this script I'd be appreciated to get a hint where I can find it.
Many thanks in advance!
--Hans
On Sep 21, 2008, at 3:57 PM, Hans-Jörg Bibiko wrote:
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:
Allan wrote it, I just converted it to a command:
http://temp.whitefalls.org/Optimize.tmCommand.zip
It's actually in the repository as a script (in the C bundle I believe).
Quoting Michael Sheets mummer@whitefalls.org:
On Sep 21, 2008, at 3:57 PM, Hans-Jörg Bibiko wrote:
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:
Allan wrote it, I just converted it to a command:
http://temp.whitefalls.org/Optimize.tmCommand.zip
It's actually in the repository as a script (in the C bundle I believe).
Thanks a lot for the prompt help ;)
--Hans
---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.
On 21.09.2008, at 23:15, Michael Sheets wrote:
On Sep 21, 2008, at 3:57 PM, Hans-Jörg Bibiko wrote:
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:
Allan wrote it, I just converted it to a command:
Only a tiny question: Does someone have a function which does the reverse thing, i.e.
to get from NS(Mutable(String|Array)|C(oder|ell(Item)?)|Array)
this: NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder
The point is that if someone has already provided such a regexp and I want to update it to avoid forgetting of keywords.
Cheers,
--Hans
On 22.09.2008, at 10:19, Hans-Jörg Bibiko wrote:
Does someone have a function which does the reverse thing, i.e.
to get from NS(Mutable(String|Array)|C(oder|ell(Item)?)|Array)
this: NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder
I believe I found a way to do the reverse.
Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)
If someone has a better way to do this let it me know.
Here the Ruby script:
def decompileRe (re) # handle foo(bar|boo|bou)? => foo|foobar|fooboo|foobou while m = re.match(/\b(\w+)(([^(]+?))?/) do re.sub!(/\b(\w+)(([^(]+?))?/, "#{m[1]}|#{m[2].split('|').map {| x| m[1] + x }.join('|')}") end # handle foo(bar|boo|bou) => foobar|fooboo|foobou recursively while m = re.match(/\b(\w+)(([a-zA-Z|]+?))/) do re.sub!(/\b(\w+)(([^(]+?))/, "#{m[2].split('|').map {|x| m[1] + x }.join('|')}") end # return sorted array return re.split('|').sort end
decompileRe(STDIN.read.chomp).each {|r| puts r}
--Hans
On Sep 22, 2008, at 11:59 AM, Hans-Jörg Bibiko wrote:
On 22.09.2008, at 10:19, Hans-Jörg Bibiko wrote:
Does someone have a function which does the reverse thing, i.e.
to get from NS(Mutable(String|Array)|C(oder|ell(Item)?)|Array)
this: NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder
I believe I found a way to do the reverse.
Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)
If someone has a better way to do this let it me know.
Here the Ruby script:
def decompileRe (re) # handle foo(bar|boo|bou)? => foo|foobar|fooboo|foobou while m = re.match(/\b(\w+)(([^(]+?))?/) do re.sub!(/\b(\w+)(([^(]+?))?/, "#{m[1]}|#{m[2].split('|').map {| x| m[1] + x }.join('|')}") end # handle foo(bar|boo|bou) => foobar|fooboo|foobou recursively while m = re.match(/\b(\w+)(([a-zA-Z|]+?))/) do re.sub!(/\b(\w+)(([^(]+?))/, "#{m[2].split('|').map {|x| m[1] + x }.join('|')}") end # return sorted array return re.split('|').sort end
decompileRe(STDIN.read.chomp).each {|r| puts r}
Darn it Hans! I spent an hour writing a recursive descent parser to do this… and you pop out this 6 line ruby regex! Oh, well, it was a good exercise.
—Alex
On 22.09.2008, at 12:11, Alex Ross wrote:
I believe I found a way to do the reverse.
Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)
Oh, well, it was a good exercise.
Yeap ;)
Here comes a slightly improved version. It compiles the regexp only once (not for each while loop and sub) thus it's a bit faster; and I fixed [a-zA-Z|]+? to [^(]+? (I forgot it in the first version)
Maybe one can put both scripts into the Bundle Development bundle?
--Hans
great stuff Hans!
could use teaching about some options like (?i) - which makes the search case-insensitive
\b(?i)(Boundary|CMatrix)\b
should probably go to either options: (?i) \bBoundary\b \bCMatrix\b
or \b(?i)Boundary\b \b(?i)CMatrix\b
but instead the boundary condition is stuck to the first line and a raw 'i' is prefixed to each word, and the trailing \b is lost.
\b?iBoundary iCMatrix
tim
On 22 Sep 2008, at 12:10 PM, Hans-Jörg Bibiko wrote:
On 22.09.2008, at 12:11, Alex Ross wrote:
I believe I found a way to do the reverse.
Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)
Oh, well, it was a good exercise.
Yeap ;)
Here comes a slightly improved version. It compiles the regexp only once (not for each while loop and sub) thus it's a bit faster; and I fixed [a-zA-Z|]+? to [^(]+? (I forgot it in the first version)
Maybe one can put both scripts into the Bundle Development bundle?
--Hans
<DeOptimize Regexp Alternations.tmCommand.zip> _______________________________________________ textmate mailing list textmate@lists.macromates.com http://lists.macromates.com/listinfo/textmate
On 22.09.2008, at 15:47, Timothy Bates wrote:
could use teaching about some options like (?i) - which makes the search case-insensitive
\b(?i)(Boundary|CMatrix)\b
should probably go to either options: (?i) \bBoundary\b \bCMatrix\b
or \b(?i)Boundary\b \b(?i)CMatrix\b
but instead the boundary condition is stuck to the first line and a raw 'i' is prefixed to each word, and the trailing \b is lost.
\b?iBoundary iCMatrix
Tim,
my decompileRe script ONLY works for an optimized regexp string for FIXED keywords (optimized by Allan's script) WITHOUT any (?i), \b, \s, etc. stuff. My only goal was to decompose such a regexp string for updating it.
If you have something like \b(?i)(Boundary|CMatrix)\b
you can try to select only: Boundary|CMatrix
and invoke that script, modify that list, invoke the optimize script, and replace the old stuff manually. In other words you can do it only portion by portion.
I would say to parse e.g. (?i) stuff could be possible maybe BUT it would goes beyond that scope. How to parse e.g.: NS(?i)(Boundary|CMatrix|(?-i)AMatrix)
AND if one could do this how to optimize it??
--Hans
On 22/09/2008, at 6:57 AM, Hans-Jörg Bibiko wrote:
Hi,
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:
I have this list of fixed classes:
NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder
and the generator script will output something like this:
NS(Array|Mutable(String|Array)|C(ell(Item)?|oder))
For TM2, it would be nice to not have to do this pretty common step.
Perhaps we could just specify a text file in the bundle with a word per line for a certain scope, and TM automagically reads the file and optimizes a regex and uses that. This would certainly make maintaining the word list easier.
--
LD.
On 22.09.2008, at 01:33, Luke Daley wrote:
the generator script will output something like this:
NS(Array|Mutable(String|Array)|C(ell(Item)?|oder))
For TM2, it would be nice to not have to do this pretty common step.
Perhaps we could just specify a text file in the bundle with a word per line for a certain scope, and TM automagically reads the file and optimizes a regex and uses that. This would certainly make maintaining the word list easier.
Yes. This would be nice. I had a similar idea as well. Why not having a given subfolder called 'ScopeLists' in 'Syntax' or whatever where one can put such lists as one word per line and these files are named according to the scopes (maybe also zipped) like 'support.function.cappuccino.txt.gz'
Then there would be several options how to generate the regexp. If one would read a tmLanguage file through the shell like tmSnippets while loading the bundles and the 'Optimize' script is inside TM's Support file then one could write
{name = 'support.function.cappuccino'; match = '\b`"$TM_SUPPORT_PATH/optimize.rb" "$TM_BUNDLE_PATH/Syntaxes/ ScopeLists/support.function.cappuccino.txt.gz"`\b'; },
Of course, there're dozens of other possibilities to do that ;)
Furthermore, sometimes these lists can be generated automatically by grepping documentation/header/etc. files thus if the the user updates a Framework/library/etc. the tmLanguage would also be updated automatically (but this could be also a bit dangerous).
On the other hand this would also offers some other chances to write a language grammar.
Cheers,
--Hans