What encoding to use to FTP a windows zip file?

I have a 400MB zip file to FTP to a webserver. If memory serves, if my ftp client is set to use the wrong encoding (ascii vs. unicode) I’ll end up with a useless file up there.

Sorry, I would have thought this would be easily googleable - but I am getting nowhere and don’t want to waste an hour using trial & error.

Thanks!

That terminology is a little confusing, usually a FTP program can be set for ASCII or binary.

ASCII usually means anything that is not a binary file. A ZIP file is a binary file, but not really “unicode.” ASCII could also just mean 8-bit (1 byte) characters. Unicode would/should mean double-byte characters.

What FTP client are you using that’s asking this? FileZilla works well and I don’t remember any problems like that.

You could also try uploading a small ZIP file first if you have a way to check that it works on the other end.

That’s where my memory fails.

I’m using CuteFTP, which has an prominent option for encoding --> unicode/ansi, and has binary/ascii hidden under “extensions.” (ie; you specify file types that ought to be uploaded as ascii.)

Uploading a test file is good suggestion, thanks.

eta: For some reason the transfer has failed twice at ~117MB.

Command line ftp programs require you to set bin or ascii, but more recent ones seem to be able to figure it out by themselves.
If the application requires it, definitely use bin for zip files or graphics files. I’ve sometimes transferred ascii files using binary, and it works okay.

I think the ASCII or Unicode might apply to the filenames, not the file contents. If you have a file named with non-English characters you might need to use Unicode.

Use Filezilla and set the encoding to auto.

Well, ASCII strongly implies that the transmission will not be eight-bit clean. This is not what you want for a compressed file, unless you’ve encoded it with something like base64 before transmitting.

This is simply wrong. First, Unicode is up to requiring 32 bits now, and, second, there are ways to encode Unicode that aren’t based around 16-bit values at all (UTF-8, one of the most common ones, is an example).

I guess I’m way behind on Unicode. Either way Unicode shouldn’t have anything to do with a file being FTP’d as binary, agreed?

Use the BIN command to transfer in binary mode. Always.

The original ASCII code is 7-bit, and IIRC the 8th bit is used at the system discretion for parity checking or what-not. I think the FTP option to use ASCII allows ASCII text to be transmitted properly between two systems that use that 8th bit differently. The binary option preserves the bits exactly. I don’t know if there are any systems that still use 7-bit ASCII or whether this whole thing is now an obsolete issue, but is it vaguely possible that transmitting ASCII data using the binary option could cause problems.

I don’t know if I fully understand either you or FTP: If ASCII is eight-bit unclean (that is, the high bit is always cleared) then most Unicode encodings will be fatally mangled by being sent that way, with the exceptions of UTF-7 and the subset of UTF-8 that is ASCII-compatible. If ASCII means CR will be translated to CRLF or vice-versa, then UTF-8 will always make it through fine but many others may be fatally mangled.

I don’t know which interpretation is common in FTP-land; I always FTP in binary mode, myself.

“Binary” mode FTP is for anything that is not ASCII. ASCII has 7-bit characters, typically arranged in 8-bit bytes with the MSB set to zero.

The reason for this mode is because FTP is a fantastically shitty protocol. Somebody decided that it would be a good idea for FTP to look inside what it was transferring and tweak newlines when going between systems that use different newline conventions. (CRLF vs. just LF, for example.) Obviously, you don’t want to do this if you’ve got some non-text data where the byte that happens to correspond to CR should not be changed.

Unicode is not an encoding. Unicode is just a bigass list of numbers and characters associated with those numbers. A Unicode file might be encoded in UTF-8 or UTF-16 or other less common encodings. In all cases, they should be treated as binary files.

This, along with dumb ideas like separate TCP streams for data and control, are why FTP is lousy. SCP or even HTTP is much nicer if it’s available.