Is a way to determine if part of a file has been cut and paste?

Is there a way to tell from a print-out or a saved file on a disk if part of a Word document has been added by cut-and-paste?

If the modified area follows the same formatting as the original part of the document, for all intents and purposes it is not possible to know by reading the hard copy. However, it is often possible to tell by looking at the actual electronic source code of the document.

No, it isn’t. RTF includes no markup encoding to identify cut-and-paste or any other editing functions, AFAIK. Once a Word file has been saved with the changes, it’s not possible, unless the author includes markings to so indicate, that any changes have been made, unless you have access to the original document for comparison. The embeded file information might include a Last Modified date, but this provides no information on the type of modification or what, if anything, was changed.

It is possible with a Microsoft Word document file. It actually happens now and then that a company releases something (press release, etc.) in DOC format and it is possible to essentially undo the edits to see how the document was modified or redacted prior to release. I believe you have to explicitly turn this feature on in Word, something like “track changes” and the feature is frequently used in collaborative environments so users can see the edits (not just additions but deletions too) of other users. If you’re smart, you purge this data before releasing the file or convert it to a more suitable format like PDF or RTF. Not everyone is that smart.

One might also deduce cut-and-paste by redundant tags. If, for instance, you have in a document a construct like (fictitious format, but you get the idea)


<font=Courrier New>Blah, blah, blah</font><font=Times New Roman></font><font=Courrier New>Yakkity smakkity</font><font=Courrier New>More blah blah

one might deduce that the “Yakkity smakkity” section was C&Ped from some other document originally in Times New Roman. Or maybe there’s some other explanation.

I can’t get this to work that way. Maybe I’m doing something wrong, but I think I’ve got the feature turned on now. Once I save a changed document, all traces of editing are gone when I close and reopen it.

Then you’re not setting it correctly. I don’t have Word on this workstation, but I have worked on collaborative documents where all edits were tracked in the DOC file. You could choose to hide edits, in which case you saw the final product of those edits, or you could show edits in which case each user’s changes would appear in color-coded text with deletions shown as a strikethrough (so if one author changed “there” to “they’re”, you would see “there” with a strikethrough to show the deletion and then “they’re” in plain text, both color-coded to the user that made the change).

If you simply hide edits and release the document, anyone can expose the entire chain of edits. Sometimes that’s desirable and it a useful feature. Sometimes it’s a colossal fumble.

I’m going to take a different interpritation of the question than QED and the rest of them have.

Are you a teacher trying to catch a plagerizer? Try putting the suspect phrases into google and see what comes out. Also, my HS claims to have a program that can detect plagerism, but if where you work had this, you would probably know about it already.

While this is true (the option is under Tools/Track Changes in MS Word, and helpfully shows the user of Word that made the changes), Word does not identify whether a given change was entered view the keyboard or if the change came in view the clipboard; both changes show only “inserted”, the date inserted, and the name of the user who made the changes.

I found it. There’s another place you have to click to actually track the changes, in the Tools menu. I can’t see anyone turning this on accidentally, so I don’t think it’s going to help the OP much, unfortunately.

You are correct. I ran a few tests with the .RTF format and it’s not possible.

However, .RTF is not the default setting in Word. The default setting is the .DOC format and it is possible to track document changes at the code level.

Right, but again, Word has to be specifically set to do so. It doesn’t, apparently, do so by default. I wonder, however, if the document originator has selected to Track Changes for a particular document, will changes still be tracked in that document on someone else’s computer with Word set to its default to not track them? I’m thinking not. Anyone know?

Track Changes is enables on a per-file basis. So any document that has it toggled will always track changes.

So what all this boils down to is that forensically, you can’t tell with absolute certainty whether text has been cut and pasted or typed. You can make intelligent guesses based on available evidence, but still can not say with certainty.

The clearest indication would be a change in style (i.e. font changes, spacing changes, etc…) for said paragraph as Chronos earlier stated.

Track changes will simply tell you that the paragraph was added. That’s if track changes has been turned on. Which it probably wouldn’t have been outside a business environment.

Is the issue here cut-and-paste versus hand-typed, or authorship? I have major parts of documents that are cut-and-pasted — from other places in the same document where I typed them from scratch. First I brainstorm, then I move the parts all around to a more logical order, then I flesh it into an outline, then I rearrange, then I fill in the meat, then I rearrange that final smidgen, then I publish. That involves a LOT of cut and paste.

If the goal is identifying strictly hand-typed vs cut-and pasted, there’s no way, period. As others have pointed out, some programs optionally keep a revision history, but that’s doesn’t differentiate how the text entered the document.

If you’re trying to determine authorship, there are a number of programs that’ll help you do that. Each of us has a writing style, and one of my paragraphs dropped into one of your works would stand out as different.

The programs work by computing a metric of writing style, such as average word and sentence length, frequency of comma use, etc. A graph can be constructed showing the score of each paragraph or sentence. A borrowed chunk of text shows up on that graph like a spilled glass of wine on a white rug. One can also Google for particularly well-turned phrases.

But if you’re looking for a pure technological bright-lne foolproof test to prove or disprove either authorship or hand-typing, the answer is “no, there’s no such thing.”

What they said.

Summary: you can sometimes find strong reason to believe that a section of a Word document was cut and paste from somewhere else. (That somewhere else may be a work-in-progress / notes document that the same author used to hold passages that were excised from one place with the notion of repurposing them elsewhere, though. Or even from the very same document you’re in, author decides these eighteen paragraphs about Smith’s career work better after the six paragraphs about the incident in Cincinnati). But you can’t ever rule out the possibility that some section of the work was copied and pasted just because you don’t see any “footprints” of the process. Author could have composed the entire thing in another application altogether (e.g., a text editor) and then converted it to Word or pasted the whole shebang into Word to do final things like table of contents and footnotes and glossary. Then you’d have no record of the various edits and cuts and pastings.