[TxMt] Use PPI for Perl parsing?

Joshua Keroes joshua at keroes.com
Sat Sep 2 23:19:16 UTC 2006


The PPI module[1] on CPAN is designed to parse Perl (no small task
there). It was designed for projected /exactly/ like TextMate.


Why is PPI worth looking at? A few brief reasons:


1. It parses Perl wiithout executing it, so stuff like 'BEGIN {system
"rm -rf ~"}' won't do naughty things.

For the record, Textmate doesn't suffer from this problem.


2. It can parse 99% of the documents in CPAN [2]

Comparatively, TextMate isn't quite there yet. :-)


3. It's fast enough [3]. I watched a demo at OSCON 13 months ago with
a demo of PPI parsing the contents of a generic wx-based editor [4].
It responded in real-time.

TextMate is probably faster because it parses a smaller DOM than PPI.
The module PPI::XS, a Perl-C tokenizer hybrid, is an attempt to write
a faster tokenizer.


4. The Perl DOM class tree looks sane enough to use as the basis for
colorization. There are 62 classes, which should map neatly into the
TextMate model [5]


5. Adam Kennedy, the author, is a fantastic guy. He stayed at my place
for that OSCON. Not to put words in his mouth, but I'm sure he'd be
delighted to offer help integrating his baby into TextMate.


6. Once integrated, TextMate would benefit from future PPI
improvements for free. And PPI is under continual development since
2001. Updates are released about every two months, the most recent
being today, Sep 2.


Thanks for entertaining this idea,
Joshua


[1] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm

[2] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm#How_good_is_Good_Enough(TM)

 "The goal for success was originally to be able to successfully parse
99% of all Perl documents contained in CPAN. [...] At time of writing
there are only 28 non-Acme Perl modules in CPAN that PPI is incapable
of parsing. Most of these are so badly broken they do not compile as
Perl code anyway."

[3] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm#The_Tokenizer

 "The target parsing rate for PPI is about 5000 lines per gigacycle.
It is currently believed to be at about 1500, and main avenue for
making it to the target speed has now become PPI::XS, a drop-in XS
accelerator for PPI."

(At the time of this writing, PPI::XS is in the proof-of-concept stage)

[4] http://search.cpan.org/~adamk/PPI-Tester-0.06/

A wxPerl-based interactive PPI debugger/tester

[5] http://search.cpan.org/~adamk/PPI-1.117/lib/PPI.pm#The_PDOM_Class_Tree



More information about the textmate mailing list