I created a zip of some video files (H264 encoded mpeg4s) using Mac OS’s built in Archive utility. The total size was 5.3 GB. I moved the zip file to a computer running Windows 8. I tried to extract the files using the built-in Win8 decompression stuff. No dice. I tried 7zip, and it printed a bunch of errors, managed to correctly extract 2 of the files (and created 0-byte files for the others). I tried WinZip, and it wouldn’t do anything. They all gave different errors that amounted to “this is not a proper zip archive”.
The files extract just fine on my Mac.
I did an md5 hash on the original file on my Mac and the one present on the Windows machine, and they’re the same, so I know the file didn’t get corrupted in the copy.
I am baffled. What is going on here? I thought zip was a very well established and well supported file type. Is there a > 4GB file size limit I’m running into?
I bet that it’s the file size, particularly if the Windows disk is FAT32 rather than NTFS. I do this sort of Mac<>Windows ZIP file transfer all the time (with smaller files), and I haven’t had any problems.
FAT32 doesn’t permit files bigger than 4 GB, so presumably the Windows machine is using NTFS, since the md5s matched.
The original Zip format doesn’t permit archives or the files within to be bigger than 4 GB either, but many programs that handle Zip also handle Zip64, which does. Wikipedia notes that XP’s native support doesn’t handle Zip64, but Vista’s does. 7-Zip says they’ve supported Zip64 since a release in 2004.
See what happens if you copy the file from Windows back to Mac. Can you still open it?
Try using a command-line tool on the Mac to test the twice-copied zip file.
/usr/bin/unzip -t file.zip
This will likely run Infozip’s implementation, which only added Zip64 support in version 6. Run the program without any command-line arguments to check which version your Mac has.
Archive: file.zip
warning [file.zip]: 4294967296 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 4294967296
(attempting to re-compensate)
<then it lists most of the files in the archive like this
testing: Directory/video1.m4v OK>
file #14: bad zipfile offset (local header sig): 145773799
(attempting to re-compensate)
<then it lists the last few files in the archive>
At least one error was detected in file.zip.
4294967296 looks like there’s an over/underflowed uint32 somewhere, so that may well be the problem. Unzip did show all files as OK, though.
I no longer have access to the Windows computer that the errors were on (In a few days, I’ll have a chance to try on another Windows computer and see it still happens), so this is basically just speculation at this point unless I can reproduce it. I managed to transfer the files another way, so I don’t need a workaround.
I don’t see how copying the zip back would change anything if the md5 checksum is the same.
Here’s a post from 2009 saying that Mac OS X’s Archive program doesn’t work properly with Zip64, and produces bad archives whenever a file in the archive is over 4 GB. Moreover, it mentions that Apple had been told of the bugs long ago but hadn’t fixed them. Seems like they still haven’t.
The Archive program only messes up the central directory in the zip file. It collects info from other parts of the zip into one place. It’s possible for a program to ignore that central directory and just read through the whole file. Apparently when Archive reads zip files, it ignores the central directory, so it can use the corrupt archives it produces, but some other zip programs can’t.
The magic number in the error file: 4294967296 is telling. That’s 2^32 or binary 4 gig. When you get an error with that interesting of a round (for binary) number, some program can’t handle files that size or larger.