On Nov 16, 2006, at 5:55 AM, Jacob Rus wrote:
Mark Eli Kalderon wrote:
- Some commands for getting better statistics than the current
document statistics count. It would be nice to have a word count which knows how to ignore stuff in html/latex/markdown tags, etc. etc. (or maybe bundles can provide overrides to this command, and all call out to a single script), but it would also be nice to be able to get some readability statistics, such as counts of average word length, average sentence length, and maybe metrics like Flesch-Kincaid, etc. 6. It might even be nice to add some tools for checking grammar (flagging things like wordy sentences, etc.). There are some decent open-source programs for this, I believe.
Jacob, you might have a look at diction, a descendant of the Writer's Workbench: http://directory.fsf.org/GNU/diction.html
Yes indeed, I was looking at that page when I wrote the post :). When I actually have some time to make this thing, I think I will indeed use style/diction.
and `diction` checks for grammar (although it perpetuates some of the unfortunate aspects of Strunk and White's prescriptions--- useful nonetheless).
I didn't look at exactly what it did, but I assume it can't be worse than MS Office's grammar check.
As for filtering LaTeX commands, I run the generated pdf through ps2ascii before running these. Won't work for html documents, but you could run them through a text browser like lynx or links with the dump option instead.
Well, I'm still thinking about the best way to get things to work for multiple document types. It has to do a few things: a) strip out extraneous junk, and b) figure out how to get back to the specific places in the document where there are points of interest.
As a Latex user, I would like to encourage this bundle development. Diction, as it is, gets tripped up on all the math output.
Jenny