On Nov 30, 2006, at 1:23 AM, Paul McCann wrote:
Indeed: but all that work has already been done, as
the command is
operating on the pdf file. So the real question becomes: why is
ps2ascii (aka ghostscript) so slow? (Just checked on my work
machine, a 2GHz intel iMac with 2G of memory, and it's still about
30 seconds on a 250 page, 1.2MB pdf file.)
I guess the question is how are you going to get the words out of the
pdf or ps file? If you look at the pdf/ps source file, it is filled
with special commands and things. I suppose if you could export the
pdf file to a txt file, then you could count the words there with
ease. Otherwise, we are talking about parsing what seems to me to be
code even more complicated that LaTeX. You are better off with the
small error from counting words in the latex source instead.
Unless I am much mistaken.
I guess it's just an irreducibly difficult
procedure... Moral of
this story? Don't count words very often!
Or ever I would say. Why is it important how many words you have? I
guess some things have word limits, but surely this is not a check
you would have to do too often.