How to compare two Word documents?

I have the original Word document and an OCR from a printed document. The two are supposedly the same, but I expect the OCRed version to have a bit different spacing, fonts, sizes etc.

How can I compare the two documents word for word while ignoring spacing, fonts etc?

MSWord has a Compare command but I can’t recall where. In OpenOffice, it’s under Edit/Compare: open one of the two docs, then go to Edit/Compare and you’ll get the usual “pick a document” window, pick the document you want to compare to.

Since you’re interested in comparing “with no accounting for formatting or spaces”, I recommend that you start by making copies of each document, get rid of formats in these copies, then go through some cleansing for double spaces, double line breaks, etc. and compare the result. I do a similar cleanup whenever I have to compare data.

Well, try saving each document as a simple text file - that should eliminate all formatting. Then to a search-and-replace where the “Search” field is two spaces (i.e. " “) and the “Replace” field is only one (” "). Run this repeatedly until you get no more hits. Do the same for repeated line feeds, boiling them all down to singles.

I figure two or three minutes of preparations like these will save you multiple minutes of wasted time when you get to the actual compare.

In Office 2007, it is under the View tab in the Window block - called “View Side by Side” and the two documents scroll in synch so you can easily see where they get out of alignment…

That’s different, Compare actually marks the differences as notes.

Review > Compare in Word 2010, and presumably 2007.

I’m assuming you just want to make sure the text matches, since you want to ignore formatting and such.

If the document is very long, I’d use UltraCompare. They have a free trial download. Or there may be a similar shareware program out there.

That program will provide you with a side-by-side comparison, with any differences marked.

First, follow the suggestions for turning it into a text file and removing extra spaces. Also either remove multiple tabs in the same way, or change all tabs to spaces. And replace multiple line feeds while you’re at it.

I use either UltraEdit or Notepad++ (freeware) for this sort of thing due to the wildcard and pattern searches, e.g., replace all “whitespace” with a single space.

Since you don’t care about formatting at all, you could just save the contents of both files into plain .txt files and use a tool like ExamDiff to compare them. It’s designed as a programmer’s tool, so unfortunately it’ll mark differences in whitespace also…

Professional proofreaders use two marvelous machines, conveniently located between the forehead and the nose.

My old boss used to print out both copies and hold the pages, one on top of the other, up to a window (or other light source). It worked pretty well, most of the time. In your case you might get spurious differences due to white space, I suppose.

If you save them as text, you can use something like WinMerge to highlight all of the differences quickly.

This great little shareware program does the trick - and automatically converts the docs to text for you beforehand. It can also compare other types of “binary” formats. I’ve used it extensively over the years, and love it.

Which explains the rather large amount of time needed or large numbers of errors. With tedious tasks like this, the human brain just can’t compare to using a computer.