Re: [TxMt] Word Count ?

30 Nov 2006


      On 30 Nov 2006, at 06:23, Paul McCann wrote:
...
Mark Eli Kalderon wrote:
...
There is a limit to how fast an accurate utility could be if it is  
to accommodate the kind of procedural aspects of LaTeX syntax that  
Haris was referring to.
Indeed: but all that work has already been done, as the command is  
operating on the pdf file. So the real question becomes: why is  
ps2ascii (aka ghostscript) so slow? (Just checked on my work  
machine, a 2GHz intel iMac with 2G of memory, and it's still about  
30 seconds on a 250 page, 1.2MB pdf file.)
I guess it's just an irreducibly difficult procedure... Moral of  
this story? Don't count words very often!
Cheers,
Sorry, Paul, I thought you were gesturing toward something that would  
work directly on the LaTeX source.
Perhaps, there are faster ways to extract text from a PDF, but I  
don't know what they would be. There are Java libraries for doing  
this (I think its called jPDFText or something close to it). But as I  
know no Java I can't write and test the script against ps2ascii. Now  
if only the Ruby PDF::Reader module (which was meant to have text  
extraction capabilities) would materialize...
Accurate word counts may not matter to all but they do matter to  
some. I am constantly working against editorially imposed word  
limits. Most journals in the Humanities have word limits that are  
strictly enforced---unless your famous. ;) But I do take your point  
that for many use cases ps2ascii is way too slow.
Best, Mark

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [TxMt] Word Count ?