[TxMt] onigrep : Help wanted (a bit off-topic)
Hans-Joerg Bibiko
bibiko at eva.mpg.de
Fri Jun 29 08:36:40 UTC 2007
Dear all,
I know it is a bit off-topic but I believe it could also be
interesting for some TM users ;)
I'm just writing a grep-like command line tool based on the Oniguruma
library to work with UTF-8 data.
It works perfectly, and in many many cases it's faster than grep ;)
In order to be sure that this command line tool written in pure C
works on other Macs as well, I'd be appreciate if someone has a bit
time and a bit free hard disk space to check whether it runs for her/
him too. Especially whether it runs on a Intel Mac.
To run onigrep it is necessary to install the Oniguruma dylib in
beforehand. To do this simply
- download the source code from http://www.geocities.jp/kosako3/
oniguruma/archive/onig-5.8.0.tar.gz
- untar it
- cd in that folder
- execute:
./configure
make
sudo make install
that's it.
Normally Oniguruma dylib is installed in /usr/local/lib.
[I believe to use the external dylib is the best choice because
Oniguruma will be better and better. So you only have to upgrade the
dylib and not onigrep.]
Now you can run onigrep. For help type 'onigrep --help'. Up to now it
only reads UTF-8 data from stdin.
[Please note, if you did't copy onigrep in a folder listed in $PATH
you have to write the entire path to onigrep or if you're in the
folder where onigrep is located just type ./onigrep]
Some features in short terms:
- utf-8 support (that means a '.' is really one Unicode character)
- ignore case also works for all Unicode characters, not only for ASCII
- you can search across \n; multi-line mode
- ignore combining diacritics (for that you have to decompose
accented characters according the Unicode canonical decomposition
algorithm
(I attached such a tool. It is called 'unorm'. For help run 'unorm
--help'.)
example:
echo "Ag̀nes" | ./onigrep -id -i -o "a(.)n"
will output 'g̀'
echo "Ag̀nes" | ./onigrep -i -o "a(.)n"
will output nothing because ǵ is written with two Unicode
characters
- it is faster than grep in many cases:
try:
cat /usr/share/dict/web2 | ./onigrep "y$" -c
cat /usr/share/dict/web2 | grep "y$" -c
- option -cl counts the matches per line
example:
onigrep "\w+" -cl -n
How many words per line?
- you can write the regexp without escaping '(', ')', etc. as with grep
Please note, onigrep is still work in progress.
Many thanks in advanced und any feedback (suggestions, bugs, wishes)
is welcomed!!
Hans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: onigrep
Type: application/octet-stream
Size: 23268 bytes
Desc: not available
URL: <http://lists.macromates.com/textmate/attachments/20070629/d7a8f852/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unorm
Type: application/octet-stream
Size: 27132 bytes
Desc: not available
URL: <http://lists.macromates.com/textmate/attachments/20070629/d7a8f852/attachment-0001.obj>
-------------- next part --------------
PS onigrep and unorm will be available for free.
PPS One possible meaning of the Japanese word "Oniguruma" is "Devil's
wheel" like Textmate's icon ;)
More information about the textmate
mailing list