[TxMt] Compare text files based on words not lines

Christoph Prion prion67 at googlemail.com
Sat Jan 20 11:14:26 UTC 2007


Hi all

my apologies for not replying sooner than this. I have been trying to
find out what it was that led to the strange problems and eych time I
thought I knew what it was something else turned up (see under 3a
below). Maybe I need to tell you what it is that I am trying to
accomplish and start over.

The problem I am trying to solve is that in my field of science
(molecular biology) MS Word is the de facto standard for exchanging
manuscripts among collaborators and also submission of manuscripts to
journals. I can't for the life of me tell why this is because frankly
you couldn't agree on something less suitable for the job. <Argh>
The things that are particularly annoying are the track changes
feature because it does one thing well (point out small changes in an
otherwise constant environment). This encourages people to muse over
the choice of the right adjective when in reality the architecture of
the paper should be discussed and altered. On top of this it makes
Word really slow and unstable especially when Endnote is used in
conjunction with it.
I am sick of this because it is not what I need most - a working
environment which lets me concentrate on finding the best way to
present my research not struggling with the formatting.
Finding the spots that have been altered by a collaborator, however,
is a very real necessity it just should not get in the way.

Although this may sound as if I was trying to make the typical Latex
case, this is out of the question for a number of reasons. The
acceptance amongst students is too low, they all grew up without
knowing what a command line is after all. Most journals accept
manuscript only as Word documents, not RTF, PDF or Latex, so there
shoudl be at least the option to export from whatever I chose to write
in as a Word document.

that leaves me trying to solve several problems:

1) finding a writing environment  that disregards formatting, is easy
to learn and easy to read while writing.
2) Some kind of support for bibliography tools should be incorporated.
No hidden field functions that only work (hmm, work...) in some
versions of Word and screw up everything every now and then. Plain
text, please.
3) a tool that would point out where changes have been made by others.
By this I mean both sublte changes like insertion of a word or a
paragraph somewhere, but also more complex changes such as a sentence
or a paragraph that has been moved elsewhere.
4) all this would need to keep working (for me) even when
collaborating with others who prefer other tools. I very rarely write
all on my own, so a solution that would only work if everybody else
would use the same tools would not improve anything.

My plan was as follows:
1) Multimarkdown seems to hold a lot of promise as it is easy to read
and not complicated to learn. the support for math is limited but this
is not a major requirement anyway for me.

2) bibliography tools such as Endnote, Bookends or Bibtex should work
as they insert text placeholders that get scanned and turned into
citations later on. The only thing I would loose is
Cite-while-you-write (the instant formatting) and the traveling
library. These two features I switch off right after installing
Endnote, so no loss here.

3) That's a tough one.
a) The simple changes (insertions, deletions) should be easy to keep
track of. I say should because I found it to work very well as long I
only used a dummy file that I saved and compared after doing some
changes. However, when I took two revisions of a real manuscript that
I saved as text files from Word, the result was much less enlightning.
Both Filemerge as well as kdiff3 that worked well with the dummy files
indicated huge areas that were changed according to them even though
these included whole pages that were not changed. Why this is I don't
know.

b) the more complex changes are sections of the text moved elsewhere.
I am still on the lookout for the right tool for the job. If anyone
knows a tool that would do that let me know, otherwise I'll write
something myself.

4) Multimarkdown allows exporting RTF and .doc. I haven't decided
whether Textmate or Scrivener which support different aspects of MMD
would be easier to use in practice for the purpose.

My apologies for the long post. I would appreciate any and all helpful comments

Christoph



More information about the textmate mailing list