On Nov 30, 2006, at 1:23 AM, Paul McCann wrote:
Indeed: but all that work has already been done, as the command is operating on the pdf file. So the real question becomes: why is ps2ascii (aka ghostscript) so slow? (Just checked on my work machine, a 2GHz intel iMac with 2G of memory, and it's still about 30 seconds on a 250 page, 1.2MB pdf file.)
I guess the question is how are you going to get the words out of the pdf or ps file? If you look at the pdf/ps source file, it is filled with special commands and things. I suppose if you could export the pdf file to a txt file, then you could count the words there with ease. Otherwise, we are talking about parsing what seems to me to be code even more complicated that LaTeX. You are better off with the small error from counting words in the latex source instead.
Unless I am much mistaken.
I guess it's just an irreducibly difficult procedure... Moral of this story? Don't count words very often!
Or ever I would say. Why is it important how many words you have? I guess some things have word limits, but surely this is not a check you would have to do too often.
Cheers, Paul
Haris