Regexp generator for language grammars wanted

List overview All Threads
Download

newer

older

"Edit in TextMate" not working in...

LaTeX Bundle : Alt+Esc

Hans-Jörg Bibiko

21 Sep 2008 21 Sep '08

10:57 p.m.

Hi,

I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:

I have this list of fixed classes:

NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder

and the generator script will output something like this:

NS(Array|Mutable(String|Array)|C(ell(Item)?|oder))

If someone knows this script I'd be appreciated to get a hint where I can find it.

Many thanks in advance!

--Hans

Show replies by date

Michael Sheets

21 Sep 21 Sep

11:15 p.m.

On Sep 21, 2008, at 3:57 PM, Hans-Jörg Bibiko wrote:

...

I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:

Allan wrote it, I just converted it to a command:

http://temp.whitefalls.org/Optimize.tmCommand.zip

It's actually in the repository as a script (in the C bundle I believe).

Hans-Joerg Bibiko

11:35 p.m.

Quoting Michael Sheets mummer@whitefalls.org:

...

On Sep 21, 2008, at 3:57 PM, Hans-Jörg Bibiko wrote:

...
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:

Allan wrote it, I just converted it to a command:

http://temp.whitefalls.org/Optimize.tmCommand.zip

It's actually in the repository as a script (in the C bundle I believe).

Thanks a lot for the prompt help ;)

--Hans

---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program.

Hans-Jörg Bibiko

22 Sep 22 Sep

10:19 a.m.

On 21.09.2008, at 23:15, Michael Sheets wrote:

...

On Sep 21, 2008, at 3:57 PM, Hans-Jörg Bibiko wrote:

...
I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:

Allan wrote it, I just converted it to a command:

http://temp.whitefalls.org/Optimize.tmCommand.zip

Only a tiny question: Does someone have a function which does the reverse thing, i.e.

to get from NS(Mutable(String|Array)|C(oder|ell(Item)?)|Array)

this: NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder

The point is that if someone has already provided such a regexp and I want to update it to avoid forgetting of keywords.

Cheers,

--Hans

Hans-Jörg Bibiko

11:59 a.m.

On 22.09.2008, at 10:19, Hans-Jörg Bibiko wrote:

...

Does someone have a function which does the reverse thing, i.e.

to get from NS(Mutable(String|Array)|C(oder|ell(Item)?)|Array)

this: NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder

I believe I found a way to do the reverse.

Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)

If someone has a better way to do this let it me know.

Here the Ruby script:

def decompileRe (re) # handle foo(bar|boo|bou)? => foo|foobar|fooboo|foobou while m = re.match(/\b(\w+)(([^(]+?))?/) do re.sub!(/\b(\w+)(([^(]+?))?/, "#{m[1]}|#{m[2].split('|').map {| x| m[1] + x }.join('|')}") end # handle foo(bar|boo|bou) => foobar|fooboo|foobou recursively while m = re.match(/\b(\w+)(([a-zA-Z|]+?))/) do re.sub!(/\b(\w+)(([^(]+?))/, "#{m[2].split('|').map {|x| m[1] + x }.join('|')}") end # return sorted array return re.split('|').sort end

decompileRe(STDIN.read.chomp).each {|r| puts r}

--Hans

Alex Ross

12:11 p.m.

On Sep 22, 2008, at 11:59 AM, Hans-Jörg Bibiko wrote:

...

On 22.09.2008, at 10:19, Hans-Jörg Bibiko wrote:

...
Does someone have a function which does the reverse thing, i.e.

to get from NS(Mutable(String|Array)|C(oder|ell(Item)?)|Array)

this: NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder

I believe I found a way to do the reverse.

Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)

If someone has a better way to do this let it me know.

Here the Ruby script:

def decompileRe (re) # handle foo(bar|boo|bou)? => foo|foobar|fooboo|foobou while m = re.match(/\b(\w+)(([^(]+?))?/) do re.sub!(/\b(\w+)(([^(]+?))?/, "#{m[1]}|#{m[2].split('|').map {| x| m[1] + x }.join('|')}") end # handle foo(bar|boo|bou) => foobar|fooboo|foobou recursively while m = re.match(/\b(\w+)(([a-zA-Z|]+?))/) do re.sub!(/\b(\w+)(([^(]+?))/, "#{m[2].split('|').map {|x| m[1] + x }.join('|')}") end # return sorted array return re.split('|').sort end

decompileRe(STDIN.read.chomp).each {|r| puts r}

Darn it Hans! I spent an hour writing a recursive descent parser to do this… and you pop out this 6 line ruby regex! Oh, well, it was a good exercise.

—Alex

Hans-Jörg Bibiko

1:10 p.m.

On 22.09.2008, at 12:11, Alex Ross wrote:

...

...
I believe I found a way to do the reverse.

Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)

Oh, well, it was a good exercise.

Yeap ;)

Here comes a slightly improved version. It compiles the regexp only once (not for each while loop and sub) thus it's a bit faster; and I fixed [a-zA-Z|]+? to [^(]+? (I forgot it in the first version)

Maybe one can put both scripts into the Bundle Development bundle?

--Hans

Timothy Bates

3:47 p.m.

great stuff Hans!

could use teaching about some options like (?i) - which makes the search case-insensitive

\b(?i)(Boundary|CMatrix)\b

should probably go to either options: (?i) \bBoundary\b \bCMatrix\b

or \b(?i)Boundary\b \b(?i)CMatrix\b

but instead the boundary condition is stuck to the first line and a raw 'i' is prefixed to each word, and the trailing \b is lost.

\b?iBoundary iCMatrix

tim

On 22 Sep 2008, at 12:10 PM, Hans-Jörg Bibiko wrote:

...

On 22.09.2008, at 12:11, Alex Ross wrote:

...
...
I believe I found a way to do the reverse.

Install the attached command, edit a tmLanguage in TM, select a given optimized regexp and invoke that command. It will open a new doc with a sorted list of - hopefully - all matched keywords. I tested it for some of these regexps BUT PLEASE check it whether it works for everything ;)

Oh, well, it was a good exercise.

Yeap ;)

Here comes a slightly improved version. It compiles the regexp only once (not for each while loop and sub) thus it's a bit faster; and I fixed [a-zA-Z|]+? to [^(]+? (I forgot it in the first version)

Maybe one can put both scripts into the Bundle Development bundle?

--Hans

<DeOptimize Regexp Alternations.tmCommand.zip> _______________________________________________ textmate mailing list textmate@lists.macromates.com http://lists.macromates.com/listinfo/textmate

Hans-Jörg Bibiko

4:15 p.m.

On 22.09.2008, at 15:47, Timothy Bates wrote:

...

could use teaching about some options like (?i) - which makes the search case-insensitive

\b(?i)(Boundary|CMatrix)\b

should probably go to either options: (?i) \bBoundary\b \bCMatrix\b

or \b(?i)Boundary\b \b(?i)CMatrix\b

but instead the boundary condition is stuck to the first line and a raw 'i' is prefixed to each word, and the trailing \b is lost.

\b?iBoundary iCMatrix

Tim,

my decompileRe script ONLY works for an optimized regexp string for FIXED keywords (optimized by Allan's script) WITHOUT any (?i), \b, \s, etc. stuff. My only goal was to decompose such a regexp string for updating it.

If you have something like \b(?i)(Boundary|CMatrix)\b

you can try to select only: Boundary|CMatrix

and invoke that script, modify that list, invoke the optimize script, and replace the old stuff manually. In other words you can do it only portion by portion.

I would say to parse e.g. (?i) stuff could be possible maybe BUT it would goes beyond that scope. How to parse e.g.: NS(?i)(Boundary|CMatrix|(?-i)AMatrix)

AND if one could do this how to optimize it??

--Hans

Luke Daley

1:33 a.m.

On 22/09/2008, at 6:57 AM, Hans-Jörg Bibiko wrote:

...

Hi,

I do not know whether I can remember correctly but I believe that someone mentioned a generator script for language grammars' regexp of a set of fixed names. I mean e.g.:

I have this list of fixed classes:

NSArray NSMutableString NSMutableArray NSCell NSCellItem NSCoder

and the generator script will output something like this:

NS(Array|Mutable(String|Array)|C(ell(Item)?|oder))

For TM2, it would be nice to not have to do this pretty common step.

Perhaps we could just specify a text file in the bundle with a word per line for a certain scope, and TM automagically reads the file and optimizes a regex and uses that. This would certainly make maintaining the word list easier.

LD.

Hans-Jörg Bibiko

9:45 a.m.

On 22.09.2008, at 01:33, Luke Daley wrote:

...

...
the generator script will output something like this:

NS(Array|Mutable(String|Array)|C(ell(Item)?|oder))

For TM2, it would be nice to not have to do this pretty common step.

Perhaps we could just specify a text file in the bundle with a word per line for a certain scope, and TM automagically reads the file and optimizes a regex and uses that. This would certainly make maintaining the word list easier.

Yes. This would be nice. I had a similar idea as well. Why not having a given subfolder called 'ScopeLists' in 'Syntax' or whatever where one can put such lists as one word per line and these files are named according to the scopes (maybe also zipped) like 'support.function.cappuccino.txt.gz'

Then there would be several options how to generate the regexp. If one would read a tmLanguage file through the shell like tmSnippets while loading the bundles and the 'Optimize' script is inside TM's Support file then one could write

{name = 'support.function.cappuccino'; match = '\b`"$TM_SUPPORT_PATH/optimize.rb" "$TM_BUNDLE_PATH/Syntaxes/ ScopeLists/support.function.cappuccino.txt.gz"`\b'; },

Of course, there're dozens of other possibilities to do that ;)

Furthermore, sometimes these lists can be generated automatically by grepping documentation/header/etc. files thus if the the user updates a Framework/library/etc. the tmLanguage would also be updated automatically (but this could be also a bit dangerous).

On the other hand this would also offers some other chances to write a language grammar.

Cheers,

--Hans

6173

days inactive

6174

days old

textmate@lists.macromates.com

10 comments

participants

tags (0)

participants (6)

Alex Ross
Hans-Joerg Bibiko
Hans-Jörg Bibiko
Luke Daley
Michael Sheets
Timothy Bates