I’m thinking of uploading scans of documents which contain personal information about me on a publicly accessible web site. I’m thinking of blacking passages from these documents which I do not want to be available to the rest of the world. Is it safe to do so by putting a black rectangle over the passage in ordinary graphics software (I’m thinking of the version of Paint which comes with Windows XP). Is that safe, or is it possible to somehow reverse engineer the resulting jpeg to reveal the information beneath the black rectangle?
It’s safe if you do as you suggest: cover the area in Paint.
What would not be safe is blocking the area with a separate layer, or distorting the area without overwriting it. (A criminal got busted by obscuring his face with a swirl effect. The cops simply unswirled it, because the distortion didn’t remove any of the original data; it just distorted it.)
Paint doesn’t support layers, so anything you overwrite with black rectangles is irretrievably hidden.
You could ask this guy. I believe he has some first hand experience in how effective such things are.
I imagine you will be quite safe doing it the way you propose.
If you have access to Adobe Acrobat (not Reader), or a friend does, that product has a “redaction tool” that is designed to do precisely what you are talking about, to PDF docs. If I were doing what you are doing, I would scan into Acrobat and use the redaction tool and then save a PDF.
Of course, the software is frightfully expensive and not worth buying just for that. (it came bundled with my sheet-fed scanner)
Thanks much for citing the example I mentioned.
Again, using black rectangles in Paint is secure. As a test, obscure the entire image with black, and save it as a new jpeg. Then create another new jpeg from scratch at the same dimensions, all black. Save that one too.
Both images will be the same size on disk, demonstrating that they both contain the same amount of information. Since the second one never had any information in it to begin with, it is easy to see that the first one had its information successfully removed.
I would watch out about doing this in Photoshop or something fancy unless you really know what you are doing. Certainly you would have to discard all layers. But even then, if you didn’t use the right tool, the original information may be disguised. Probably if you limited the document to internet colors it would be safer since then it would be unlikely that original information could be hidden in near black colors.
Only if you saved it in a format that supported layers like .psd or something. Standard JPEG images need to be flattened before they can be saved and as a result any obscured information is lost. Or, if data is stored seperately (as metadata), say in the case of DICOM, TIFF images and in the EXIF data on digital photographs.
I use a combination of black rectangles and metadata editing when producing graphics for presentations and publication from medical imaging (creating JPEGs from DICOM and TIFF files) at work where I have to remove confidential information before distribution.
Any time I obscure something in a photo, I open it in a photo editing program, black it out, then take a screenshot of the new image and upload the screenshot instead of the edited original.
Never mind the technical side: even with the bits blocked, could someone identify you from the information left?
similupost. Your response wasn’t there when I went looking for the link.
Fubaya, that’s a great technique to absolutely ensure the effect no matter what program you’re using.
Justy to clarify, the verification test I detailed in post 5 only works with jpegs – not bitmaps – and it only proves successful if (when) both “all black” images end up being the same size as each other AND significantly smaller than the original.
Thanks everybody. I think I’ll do it in Paint as described; the unblacked rest of the document is not sensitive and will not allow identification.
Fubaya, I had the same technique in mind, but it appeared a bit pedestrian and amateurish to me (which by itself wouldn’t hurt, since I am an amateur). It’s good to hear others come up with the same cumbersome ways.
I agree with the solutions mentioned. I will just add some info for the OP.
It seems that this concern is born out of what happens when people try to erase data on a hard drive, only to find out that vestiges of the data are still there to be recovered by some clever means. Sometimes this data is merely marked as “deleted” rather than actually deleted. Sometimes it is really deleted, but sophisticated tools can look at the disk at a very low physical level and sometimes recover “lost” data.
There are also legitimate concerns about releasing Word or PDF documents that may contain change histories, allowing people to see changes that have been made in prior versions of the documents (the White House learned this the hard way).
However, in the case of a flat JPEG or BMP image, once you change a pixel there is nothing to indicate what any previous value of that pixel ever was.
BTW I have found that using Paint to save an image as a JPEG gives rather poor results. Picasa is a popular free tool that might also do what you need.
Fubaya: Why? That sounds like superstition more than technology to me.
CookingWithGas is correct, and to do these things right you really need to know about the various file formats involved. But I have to comment on this part:
I must emphasize that this has nothing to do with the thread. If you think someone might get physical access to your computers, you have to implement a much more complex security scheme than merely knowing how to securely redact documents. Keep in mind that a sledgehammer or a .357 hollow point will securely erase a hard drive, and you get really strong magnets to play with after you’re done.
I believe that the responses to this thread have overstated the, “ironclad” nature of obscuring information in .jpg’s through black boxes.
Looking past metadata, etc., a .jpg is a lossy compression format which means that pixels are not mapped completely independently; rather each pixel has a slight dependence upon neighboring pixels. I’ll admit to not knowing much about the compression schemes that underlie .jpg compression, but some slight amount of information about neighboring pixels is retained in a .jpg, and simply obscuring the relevant parts won’t completely purge the .jpg of relevant information, especially as the image gets more compressed.
To illustrate the point, I’ve uploaded four images in order from Example 1 to Example 4. The first is a lossless .png file with some, “sensitive information,” in it, the second is a low-quality .jpg of that .png, the third is a “censored” version of that .jpg, and the last image is a .png with a blue fill added to more clearly demonstrate some of the retained data. Incidentally, this retained information would also survive a lossless screen print.
Could someone really go back and piece together any real information from the palimpsest “censored,” .jpg? Maybe. It isn’t likely for some casual observer that isn’t highly motivated to discover something about the original information, but I wouldn’t trust national secrets to it.
Without understanding the entire image-stream, it’s overconfident to say that the documents are utterly, absolutely clean of any sensitive information.
Several reasons really. Some are just convenience.
a) It’s foolproof. I don’t really know how jpgs work. More importantly, I don’t know how the programs I use work. I use Linux (so don’t bother explaining Paint or PSP) and use some scientific image analysis programs which I am confident in, but I know some others that delete and add extra data. I wrote my own screenshot script and I know it only saves what it sees.
b) Taking a screenshot gets rid of exif data
c) I don’t have to save the edited original or save it with a different name or in a different place. The screenshot always saves to the current directory in sequential numbers. Ever take a digital photo named dsc00652.jpg, edit it, save it, close it, and spend 10 minutes trying to find it again?
d) I can select the area I want in the screenshot or click the titlebar to take a shot of the whole image, and it’s much quicker than cropping in any image editor.
I use a free PDF creator and print my blacked out JPEGS to a PDF file then there is nothing that can be “recovered”
Remember, even though the file is saved using the neighbor-dependent algorithm, the black box is drawn without any such effect. When you go into Paint and draw a black box, it is 100% independent of neighboring pixels. Then when you go to save it, sure, the neighbor algorithm kicks in, but it’s too late to have retained any information beneath the black boxes, which were drawn in perfect bitmap style. (Since that’s what Paint is; a bitmap editor.)
It’s precisely because of all of this uncertainty folks keep mentioning about reconstructing from vestiges of bits that I mentioned and use Acrobat for this.
The Acrobat redaction feature was designed specifically for this purpose: you can tell it to remove all instances of “Acme Widgets” from your document prior to sending it to a competitor of “Acme Widgets” and you can manually draw black (or any other colored) blocks over any bit of the document you wish. Once you commit your changes to the document, the original information is gone forever.
Of course, Acrobat is expensive, and I don’t know what guarantee they give as to the robustness of their system against limitless NSA-like resources.
Is the .pdf format open to inspection by third parties, or is it closed-source?