Dear all,
I know it is a bit off-topic but I believe it could also be interesting for some TM users ;)
I'm just writing a grep-like command line tool based on the Oniguruma library to work with UTF-8 data. It works perfectly, and in many many cases it's faster than grep ;)
In order to be sure that this command line tool written in pure C works on other Macs as well, I'd be appreciate if someone has a bit time and a bit free hard disk space to check whether it runs for her/ him too. Especially whether it runs on a Intel Mac.
To run onigrep it is necessary to install the Oniguruma dylib in beforehand. To do this simply
- download the source code from http://www.geocities.jp/kosako3/ oniguruma/archive/onig-5.8.0.tar.gz - untar it - cd in that folder - execute: ./configure make sudo make install
that's it. Normally Oniguruma dylib is installed in /usr/local/lib.
[I believe to use the external dylib is the best choice because Oniguruma will be better and better. So you only have to upgrade the dylib and not onigrep.]
Now you can run onigrep. For help type 'onigrep --help'. Up to now it only reads UTF-8 data from stdin. [Please note, if you did't copy onigrep in a folder listed in $PATH you have to write the entire path to onigrep or if you're in the folder where onigrep is located just type ./onigrep]
Some features in short terms: - utf-8 support (that means a '.' is really one Unicode character) - ignore case also works for all Unicode characters, not only for ASCII - you can search across \n; multi-line mode - ignore combining diacritics (for that you have to decompose accented characters according the Unicode canonical decomposition algorithm (I attached such a tool. It is called 'unorm'. For help run 'unorm --help'.) example: echo "Ag̀nes" | ./onigrep -id -i -o "a(.)n"
will output 'g̀'
echo "Ag̀nes" | ./onigrep -i -o "a(.)n"
will output nothing because ǵ is written with two Unicode characters - it is faster than grep in many cases:
try: cat /usr/share/dict/web2 | ./onigrep "y$" -c cat /usr/share/dict/web2 | grep "y$" -c
- option -cl counts the matches per line example: onigrep "\w+" -cl -n How many words per line?
- you can write the regexp without escaping '(', ')', etc. as with grep
Please note, onigrep is still work in progress.
Many thanks in advanced und any feedback (suggestions, bugs, wishes) is welcomed!!
Hans
PS onigrep and unorm will be available for free. PPS One possible meaning of the Japanese word "Oniguruma" is "Devil's wheel" like Textmate's icon ;)
Doesn't work on my MacBook Pro.
dyld: Library not loaded: /usr/local/lib/libonig.2.dylib Referenced from: /usr/local/bin/onigrep Reason: no suitable image found. Did find: /usr/local/lib/libonig.2.dylib: mach-o, but wrong architecture /usr/local/lib/libonig.2.dylib: mach-o, but wrong architecture Trace/BPT trap
:/
On 6/29/07, Hans-Joerg Bibiko bibiko@eva.mpg.de wrote:
Dear all,
I know it is a bit off-topic but I believe it could also be interesting for some TM users ;)
I'm just writing a grep-like command line tool based on the Oniguruma library to work with UTF-8 data. It works perfectly, and in many many cases it's faster than grep ;)
In order to be sure that this command line tool written in pure C works on other Macs as well, I'd be appreciate if someone has a bit time and a bit free hard disk space to check whether it runs for her/ him too. Especially whether it runs on a Intel Mac.
To run onigrep it is necessary to install the Oniguruma dylib in beforehand. To do this simply
- download the source code from http://www.geocities.jp/kosako3/
oniguruma/archive/onig-5.8.0.tar.gz
- untar it
- cd in that folder
- execute:
./configure make sudo make install
that's it. Normally Oniguruma dylib is installed in /usr/local/lib.
[I believe to use the external dylib is the best choice because Oniguruma will be better and better. So you only have to upgrade the dylib and not onigrep.]
Now you can run onigrep. For help type 'onigrep --help'. Up to now it only reads UTF-8 data from stdin. [Please note, if you did't copy onigrep in a folder listed in $PATH you have to write the entire path to onigrep or if you're in the folder where onigrep is located just type ./onigrep]
Some features in short terms:
- utf-8 support (that means a '.' is really one Unicode character)
- ignore case also works for all Unicode characters, not only for ASCII
- you can search across \n; multi-line mode
- ignore combining diacritics (for that you have to decompose
accented characters according the Unicode canonical decomposition algorithm (I attached such a tool. It is called 'unorm'. For help run 'unorm --help'.) example: echo "Ag̀nes" | ./onigrep -id -i -o "a(.)n"
will output 'g̀' echo "Ag̀nes" | ./onigrep -i -o "a(.)n" will output nothing because ǵ is written with two Unicode
characters
it is faster than grep in many cases:
try: cat /usr/share/dict/web2 | ./onigrep "y$" -c cat /usr/share/dict/web2 | grep "y$" -c
option -cl counts the matches per line example: onigrep "\w+" -cl -n How many words per line?
you can write the regexp without escaping '(', ')', etc. as with grep
Please note, onigrep is still work in progress.
Many thanks in advanced und any feedback (suggestions, bugs, wishes) is welcomed!!
Hans
PS onigrep and unorm will be available for free. PPS One possible meaning of the Japanese word "Oniguruma" is "Devil's wheel" like Textmate's icon ;)
For new threads USE THIS: textmate@lists.macromates.com (threading gets destroyed and the universe will collapse if you don't) http://lists.macromates.com/mailman/listinfo/textmate
On 29.06.2007, at 18:14, Dougal wrote:
Doesn't work on my MacBook Pro.
dyld: Library not loaded: /usr/local/lib/libonig.2.dylib Referenced from: /usr/local/bin/onigrep Reason: no suitable image found. Did find: /usr/local/lib/libonig.2.dylib: mach-o, but wrong architecture /usr/local/lib/libonig.2.dylib: mach-o, but wrong architecture Trace/BPT trap
Many thanks. According to this error message I believe that one have to tell Oniguruma's ./configure the desired architecture.
I will have a look on it.
Thanks!!
Hans