WINZIP and file integrity

I was wondering: if i use winzip to create a zip file (whether word documents, or images, or video clips), and then frequently unzip and rezip these files in order to use them, can the zipping and unzipping process itself corrupt or destroy any of these files by losing pieces of data etc.?

What actually happens in the zipping process?

Thanks.

In zipping, or using any lossless compression method, the bitstrings that represent your file go through a translation into a bitstring that (usually) does not take up as much space. There are a number of different lossless compression codes, and I do not know offhand which one ZIP uses.

As long as the integrity of the ZIP file is not compromised, you should have no worries about losing the ZIPped files.

No, no data is ever lost when you zip a file using winzip. The zip algorithm is by definition “loseless”, which means you don’t have to worry about your Word files getting messing up because a few bits get lost in the translation. The algorithm exploits regular patterns in your file to reduce the number of bits it takes to represent your file. A file that is a string of truly random characters will not compress at all. Most image and video formats are already compressed and typically don’t zip down very much at all.

Here’s basically how it works, though Winzip’s algorithms are probably far more complex than the example they give.
http://express.howstuffworks.com/neatstuff-mp31.htm

An zipped file example:

I was wonder3: if i use win0 to create a 0 1 (whe2r word documents, or images, or video clips), and 2n frequently un0 and re0 these 1s in order to use 2m, can 2 0p3 and un0p3 process itself corrupt or destroy any of 2se 1s by los3 pieces of data etc.?

What actually happens in 2 0p3 process?

Thanks.
Lookup table
0 = zip
1 = file
2 = the
3 = ing

similar data is searched for an replaced in the existing document. here, i searched for a few common terms (zip, file, the and ing) and replaced them with a token, in this case, 1,2,3 and 4. the token takes less space up in the file. then at the end (or beginning, depend on the compression format) i put a table in that describes what each token stands for.

Therefore, I can recreate the original document at anytime, and lose no data if all the tokens are replaced the thier original values.

Thanks everyone. I feel better now that i know my data will be secure.

mhendo.

On the other hand, if you’re zipping these files onto floppies, particularly multiple floppies (i.e., even the zipped file is too large for one floppy, so you’re creating a spanned set), any problem with any one of those floppy disks can leave you with nothing, nada, zilch!

Well, that’s not entirely true. Pkzip’s always come with pkzipfix. Since the lookup table is at the start of the file, and (especially in modem transferred files) the end is usually the missing bit, you can re-construct whichever files came first in the archive by running that on it.
Thought I do admit I can’t recall how exactly to do that with spanned archives…

The lookup table is at the end, actually. Each file has header information, however, which zipfix utilities use to reconstruct as much of the table, as possible, should the file be damaged or truncated.

I wrote some zip-directory and automatic unzipping utilities for a BBS (Major BBS, anyone?) back in the day. FILE_ID.DIZ extraction. No doors or shelling out.

-AmbushBug

Well shut mah mouth. You’re right.
I’ll salvage my answer by saying that there is lookup info at the start of each compressed file section in the zipfile, (cue nifty graphics), whilst the zipfile itself contains mostly redundant lookup info at the end.

This makes sense to me, as it allows for easy appending to zip archives.

Keep in mind that not all compression methods are lossless. JPEG files are lossy, so if you load and save the file repeatedly it degrades. If you edit a JPEG image, save it as an uncompressed file (e.g. TIFF). Even if you need the edited image as a JPEG file, do the conversion after all the editing is done.

However, lossy compression is only used on sound, image and other specific types of data. All general-purpose file compression utilitiese (Info-ZIP, WinZip, gzip, compress, arc, lzh, etc) use lossless compression.

I knew that JPEGs were lossy, and i actually tend to save my uncompressed digital photos as Photoshop documents to avoid this porblem.

But are you suggesting that if i frequently zip and unzip JPEGs, they will degrade each time it’s done because they are a lossy format? That’s not the impression that i got from the other posts on this thread.

No, zipping and unzipping won’t change the file, but if you uncompress and recompress a JPEG (by using some kind of image editing program) you will lose image quality.

MP3 and MPEG are also lossy compression formats: if you edit an MP3 song (or MPEG video), save it, open it again, edit it, save it, etc… pretty soon you’ll notice the difference in quality, because each time you re-encode the file, you lose some information.

To expand a bit on cool ZIP programming, the directory routines I had a hand in would open the file, fo 1K from the end and read forward looking for the start index; back another 1K if not found, repeat.

Had the MOST fun when dealing with a .ZIP file stored at the end of a .ZIP file; its index was the one found and confused my poor routines all to heck. Had to rewrite everything to bulletproof against cases like this.

-AmbushBug