[TxMt] Compare text files based on words not lines

Christoph Prion prion67 at googlemail.com
Sun Jan 21 15:17:28 UTC 2007


On 1/20/07, Holger Frauenrath <mail at frauenrath.com> wrote:
> > Hi Christoph,
>
> don't worry about it. Your question may be off-topic but, I think,
> still interesting to others. At least, I have the same problem, which
> also means that I have no good solution for you.

Good to know I am not the only one While I agree that my question may
be borderline off-topic to some, I only gave the background
information about what I am trying to achieve on the whole because I
thought it might help to stay focused on the problem.
In particular I was (and still am) determined to find out how Textmate
can help me solve some or all of these problems.



What I am currently
> doing is:
>
> - Write my research proposals and other long documents that are
> supposed to look good in LaTeX, using Textmate/PDFView/BibDesk
> - Write my paper manuscripts in MS Word, using EndNote (library
> imported from BibDesk)
> - Write my notes and summaries of scientific papers (all things
> needed for future use in articles, reviews, proposals, or as
> introductions for new students) in RTF with basic formatting so that
> I can later reuse these either way (e.g., references in curly
> brackets so that EndNote will recognize them when I copy it to Word,
> and I only have to add \cite when I use it in LaTeX)
>
> Pretty clumsy.

Fully agreed ;-) Not that my solution were more streamlined, though.
Which is precisely why I am trying to move away from the situation as
it is. It is interesting that you also have what I call the
"Schrottplatz", i.e. bits and pieces which were discarded from
previous versions of a grant proposal or paper, but which may prove
useful at a later stage or for a different project.

Having to use so many different tools and doing file comparison
manually is bad, however. One of the things that leave me without a
clue is why the tools I know and have used to compare text files give
very different results depending on the context of the changes (or the
size of the document). Can you (or somebody else) please make the
following experiment, please?

-Take two revisions of a large paper you are writing, with some major
changes between the two. Export as text-only files (or copy and paste
into a text-only program such as Textmate).

-Run a file comparison using either Word, kdiff3 (which I kind of
like) or FileMerge.

-Please report your findings here. In particular, can you reproduce
the problem I keep running into: Paragraphs that contain minor changes
(sometimes only a different word) are marked as a whole leaving the
job to spot the difference entirely to you? It is not uncommon to find
there are ten changes but the regions containing the changes make up
about 80% or more of the entire paper. Each region marked as different
typically contains many changes at different places. It is highly
annoying that whole paragraphs within these regions that did NOT
change between the two revisions are not reported as identical.
I had a gut feeling that text encoding (UTF-8, Roman etc) had
something to do with it but haven't done systematic testing so far.

>And the more complex part of your problem (compare
> files) has been done manually by myself. The use of Markdown was new
> to me, and I only checked it out after you mentioned it in one of
> your last emails. It sure looks interesting.
>
> I would also like to see some helpful comments on this issue.
>

Glad you like it, too. The thing I like about MMD is that you have
only a single parent document from which you can derive as many
differently formatted child documents as you need. And it is much
easier to learn and read than pure Latex. Nice for exchanging with
other people, there are even exporters for RTF and .doc format files.

Any comments?

Christoph



> Best regards
> Holger
>
>
> ______________________________________________________________________
> For new threads USE THIS: textmate at lists.macromates.com
> (threading gets destroyed and the universe will collapse if you don't)
> http://lists.macromates.com/mailman/listinfo/textmate
>



More information about the textmate mailing list