- Some commands for getting better statistics than the current
document statistics count. It would be nice to have a word count which knows how to ignore stuff in html/latex/markdown tags, etc. etc. (or maybe bundles can provide overrides to this command, and all call out to a single script), but it would also be nice to be able to get some readability statistics, such as counts of average word length, average sentence length, and maybe metrics like Flesch- Kincaid, etc.
- It might even be nice to add some tools for checking grammar
(flagging things like wordy sentences, etc.). There are some decent open-source programs for this, I believe.
Jacob, you might have a look at diction, a descendant of the Writer's Workbench:
http://directory.fsf.org/GNU/diction.html
The `style` command gives a number of metrics including Flesch- Kincaid and `diction` checks for grammar (although it perpetuates some of the unfortunate aspects of Strunk and White's prescriptions--- useful nonetheless).
As for filtering LaTeX commands, I run the generated pdf through ps2ascii before running these. Won't work for html documents, but you could run them through a text browser like lynx or links with the dump option instead.
All the best, Mark