The PPI module[1] on CPAN is designed to parse Perl (no small task there). It was designed for projected /exactly/ like TextMate.
Why is PPI worth looking at? A few brief reasons:
1. It parses Perl wiithout executing it, so stuff like 'BEGIN {system "rm -rf ~"}' won't do naughty things.
For the record, Textmate doesn't suffer from this problem.
2. It can parse 99% of the documents in CPAN [2]
Comparatively, TextMate isn't quite there yet. :-)
3. It's fast enough [3]. I watched a demo at OSCON 13 months ago with a demo of PPI parsing the contents of a generic wx-based editor [4]. It responded in real-time.
TextMate is probably faster because it parses a smaller DOM than PPI. The module PPI::XS, a Perl-C tokenizer hybrid, is an attempt to write a faster tokenizer.
4. The Perl DOM class tree looks sane enough to use as the basis for colorization. There are 62 classes, which should map neatly into the TextMate model [5]
5. Adam Kennedy, the author, is a fantastic guy. He stayed at my place for that OSCON. Not to put words in his mouth, but I'm sure he'd be delighted to offer help integrating his baby into TextMate.
6. Once integrated, TextMate would benefit from future PPI improvements for free. And PPI is under continual development since 2001. Updates are released about every two months, the most recent being today, Sep 2.
Thanks for entertaining this idea, Joshua
[1] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm
[2] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm#How_good_is_Good_Enough(T...)
"The goal for success was originally to be able to successfully parse 99% of all Perl documents contained in CPAN. [...] At time of writing there are only 28 non-Acme Perl modules in CPAN that PPI is incapable of parsing. Most of these are so badly broken they do not compile as Perl code anyway."
[3] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm#The_Tokenizer
"The target parsing rate for PPI is about 5000 lines per gigacycle. It is currently believed to be at about 1500, and main avenue for making it to the target speed has now become PPI::XS, a drop-in XS accelerator for PPI."
(At the time of this writing, PPI::XS is in the proof-of-concept stage)
[4] http://search.cpan.org/~adamk/PPI-Tester-0.06/
A wxPerl-based interactive PPI debugger/tester
[5] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm#The_PDOM_Class_Tree
On 3/9/2006, at 1:19, Joshua Keroes wrote:
The PPI module[1] on CPAN is designed to parse Perl (no small task there). It was designed for projected /exactly/ like TextMate.
TextMate needs a name assigned to each parsed entity so that scope selectors can be used (used to abstract visual styles away from the parser, apply settings to subsets of the document, limit key bindings, etc.).
It needs grammars to be able to include each other, e.g. Perl here- docs include the HTML grammar when the token is HTML, the HTML (Mason) grammar includes Perl inside <% … %> etc.
If each language had its own custom parser, this would not be feasible, and it would significantly lower the barrier to language grammar hacking.
Not to mention that TM needs to constantly re-parse the document when it is being edited, but only the part of the document actually changed. E.g. if you add a keyword at line 10,325 then only that line should be re-parsed, though had you added ‘=begin’ then it needs to re-parse until it sees ‘=end’ -- stand-alone parsers like PPI are not made for these things.