Comparing multiple documents

naita · October 22, 2012, 12:24pm

I’m looking for a tool that seems to fall between to chairs. I don’t want to check a document against an online database to prevent plagiarism, and I don’t want to run a side by side comparison of two chosen documents on my hard drive.

I want to run a plagiarism search on a small set of documents on my hard drive. Let’s say I have files A-Z and alpha. I’m, forever reason, sure alpha is plagiarising one or more of the files A-Z, and I want to know if which of the files A-Z alpha is a copy, in part or full.

Anyone know of a good tool for that purpose?

robert_columbia · October 23, 2012, 5:22am

A half dozen work/study students.

Reply · October 23, 2012, 5:37am

…

Reply · October 23, 2012, 5:41am

Seems like Anti-Twin will let you specify a match %.

naita · October 23, 2012, 6:55pm

I expect a byte by byte comparison won’t work well when looking for duplicate word-processor document content.

Saintly_Loser · October 23, 2012, 7:27pm

There are a number of blacklining programs on the market, and most of them (or at least the ones with which I’m familiar) can compare one document to a number of other documents.

They’re probably really expensive – the market for these programs is law firms and corporations, not individual users. Two that come to mind are DeltaView and Litera’s ChangePro.

MS Word has blacklining capability built in, but I don’t think it can handle multiple (i.e., more than two) documents at once.

Reply · October 23, 2012, 7:59pm

…

robert_columbia · October 23, 2012, 8:58pm

Were you planning on contributing something to the discussion?

Tim_T-Bonham.net · October 23, 2012, 9:26pm

You could probably set up a batch process to compare document alpha to each of the other suspected plagiarism sources one at a time. Since the OP just wants to find out if/whic one was plagiarized, that should be enough.

Or do it manually: concatenate all the documents into one big document, then us eMS Word’s comparison on that. Once you locate the plagiarism, you can find out which document it was in.

allotrope · October 24, 2012, 5:55am

This sounds like what you want, but it’s for instructors and it’s web based so probably won’t help. However I’d send them an email. I’m sure they can help you.

Jragon · October 24, 2012, 6:19am

As long as you don’t expect anything too robust (i.e. able to determine that a hastily reworded phrase is similar to the original) this is a pretty simple script to run on plaintext – .docs and .pdfs are a bit more complicated – but only insofar as you have to extract the text as plaintext. In fact, I wrote said script as a project in Haskell for a programming class. More specifically I wrote a script to compare two documents, but there’s absolutely no reason it would be difficult to extend the script to take in a list of file names and compare a single input against them in sequence and print out similarity (and in fact printed a local html document highlighting exactly where the similarities occurred) to each.

Sadly, I can’t offer more than saying it’s a rather simple school-level program, and say that I’d be absolutely shocked if there’s not one out there somewhere that does what you want for relatively cheap. (Obviously ignoring TurnItIn since you don’t want a web DB)

Darth_Panda · October 24, 2012, 7:34am

I recommend hiring this guy: xkcd: Regular Expressions

Topic		Replies	Views
Software or service for determining contested authorship? Factual Questions	4	815	November 18, 2007
MS Word Doc difference tool Factual Questions	0	552	November 12, 2003
CyberPlagiarism Factual Questions	32	1471	November 4, 2003
Compare differences in non-Word documents? Factual Questions	6	755	October 22, 2008
Comparing two folders in Windows? Factual Questions	3	681	February 7, 2004

Comparing multiple documents

Related topics