Currently, my company takes batches of TIFFs (documents that were scanned and imaged), and we ZIP them up. We only get about 7% compression from WinZip.
We scan/image over 100,000 pieces per day, so we need to compress everything so we can FTP it (because our work is offshored).
Anyway, I guess what I am asking is this: Are TIFFs the issue? That is, are TIFFs just not going to get compressed much more…
…or are there compression options we just don’t understand?
Our images only need to be black and white, and they are virtually all text. To speed up FTP, we want them compressed as much as possible. Seeing 5-7% compression leads us to believe we are missing something obvious.
Your tiff file might already have LZW compression. If they don’t your should probably be able to find some conversion program to help you in the following page.
You’re in luck. There’s a compression algorithm specifically designed for files of that type, which will compress them very nearly as much as is logically possible.
A TIFF image is usually in compressed form, but not necessarily. If it is compressed, many different codecs are available, including embedded JPEG for example. Since your images are purely black and white, and are images of scanned documents, they should have been compressed with one of the “fax” codecs. But this is not the inevitable choice. (Do you control the creation of these images?)
I’m certainly not surprised that WinZip can’t compress them well. You shouldn’t expect it too. Compressing files that are already compressed rarely brings any significant savings, and will just as often result in larger files.
It might depend upon the program that creates the TIFFs. Try pumping it through a conversion program, save it as a different file in TIFF format, and compare the size of the files. I’ve seen TIFFs created by some scanners that were far larger than they needed to be.
A couple of things you could do would be force the color gamut from RGB to greyscale, reduce the resolution of the images, or reduce the number of distinct colors to, say, something like 8 or 16.
Also, consider changing the format from TIFF to PNG. Doing a little bit of testing on 1280x854 greyscale images limited to 16 different shades gave me 150k TIFF files and 70k PNGs.
Punoqllads has good suggestions (especially limiting the color gamut). I’d also add the following:
If you only need decent black-and-white images, forget using TIFFs and WinZip – use a photo-editing program make them highly-compressed JPEGs.
It’s possible in Adobe Photoshop (and probably other programs) to batch-process images. You’ll want your photo-editing program to do the following in sequence:
Open TIFF.
Convert gamut to black-and-white if possible – if not, then to grayscale.
Save the resulting image as a JPEG with the maximum available compression (IIRC Photoshop allows 8 levels of compression).
Close JPEG.
You’ll get very good results doing it that way – and no need to zip anything afterwards (except maybe to bundle a bunch of JPEGs together in one e-mailable package).
For photo-style images, yes, JPEG is the way to go. But if it’s mostly writing, you get better compression by quantizing the image into a smaller set of colors and then saving as a PNG.
Of course, this is all in general. When it comes right down to it, you probably need to run tests on the sorts of images that you will work with and see which format conversions – or even which programs you use to convert the images – compress best for you. In fact, if processing time is not important, you could always convert the image into PNGs, JPEGs, etc., and see which format is smallest, and then transfer that file and delete the others.
I have heard of PNG, but they were kind of new when I was in the graphics biz. Thanks for the info about the relative compression abilities of PNG and JPEG.
Question to the OP – are a significant portion of your documents just black text on white paper, and nothing else? I read that some docs have checks stapled to them, so I mean your other types of typical docs that cross your desk.
Note also that once you find the best image format, if you still need them contained in a zip file, you can zip with zero compression, which will be much faster.
Wanted to add: Yes, we control the creation of these images.
We use four high speed Kodak scanners, which create the docs and put them into Kofax. We ZIP them up and FTP them over to India.
I already had a hunch and idea about why TIFF was used from the time we implemented the process.
So, create them as PNG or JPG and then zip them up with WinZip…or some other better program?
We’re talking over 100,000 images, so how to ‘run then through’ something like photoshop?
This is a good suggestion in general, but may not be an option in a lot of cases. There are a lot of document management systems such as the evidence indexing systems used by attorneys and the archive systems used by some banks which operate only on TIFF images of scanned documents. Until all this legacy software supports PNGs, we’re stuck with TIFFs. That may not be an issue for the OP, it just depends on what they’re doing with the images once they’re transferred.
Absolutely. There is a benefit to zipping the files together, though, and that it to conveniently package the images together so that they are easier to handle for the recipient. This is a case where zipping need not involve compression.
OK, the results are in – for monochrome black-and-white, PNG format kicks ass.
Unfortunately, I haven’t got Photoshop here, so I did this experimentation in humble MS Paint. Paint does not allow one to control degree of JPEG compression as Photoshop does, so I might be able to do better JPEGs with Photoshop. But the JPEG issues didn’t really matter – PNG outperformed the other formats considerably.
I started with a 415x437-pixel white canvas. Then I wrote my name across it in 23-pixel-high black letters.
I saved this image in three ways:
Windows Bitmap (monochrome): looked good, ended up at 23 K
JPEG: to my surprise, developed light gray artifacts around the text. Still very readable, but the artifacts could be problematic on more complex images. Ended up at 6 K
PNG: looked spotless, perfect image. Final size – 663 bytes.
The best thing about using PNG format is that you don’t need expensive software to do the conversion – Paint can handle it.
… Gazpacho – you were absolutely correct about the artifacts creeping in when using JPEG compression. I knew about the artifacts when compressing photos and “fuzzy” text … but I expected sharp monochrome text to be artifact-free.