Computer question: What is a binary vs ASCII file?

Some older computer programs/protocols like the DOS “copy” command or the FTP protocol differentiate between ASCII and “binary” files. What does that mean? If you look at both in a hex editor, it all comes down to bits anyway… so what does that designation actually determine?

ASCII has character returns and line feeds at the end of each record. The characters in the file are from the ASCII character set (upper & lower case letters, numbers etc.) This post has line feeds that cause the text to wrap and make it readable.

Binary is often just a stream of characters. The information in it could be anything. Also binary files often have header records to identify the character set ASCII or EBCDIC. How many bytes in the block and other information.

Not sure what else, but ASCII files may change line termination characters depending on the operating system, e.g. newline(unix) vs Return/newline DOS. Binary sends it bit for bit

ETA http://www.serv-u.com/newsletter/NewsL2008-03-18.asp

While it is true that files are just bits, ASCII text files are stored differently on different operating systems. Where one system might have each line ended by a carriage return and a line feed, another operating system might only have the carriage return and might assume the line feed. Transferring files via FTP using ASCII mode would automatically sort that out for you so that the file would look correct on your operating system.

This would obviously cause problems with binary files, since you wouldn’t want to muck with the data in the middle of something like a bitmap image just because the pixel value happened to match the byte code of an ASCII carriage return. So FTP would transfer binary files just byte for byte without altering anything, but would modify ASCII files to format the text correctly for your operating system.

ASCII or EBCDIC table. A line feed is decimal 10, Hex 0A the table shows the values ASCII or EBCDIC. You’ll see these values in any hex editor.

This is a line.0A0D

thats a LF CR you’d see in the hex editor. Those letters would be in hex too. :wink: I didn’t want to look them up in the table.

http://publib.boulder.ibm.com/infocenter/lnxpcomp/v8v101/index.jsp?topic=%2Fcom.ibm.xlf101l.doc%2Fxlflr%2Fasciit.htm

I misread the OP. I thought he wanted a explanation of text and binary files. Tough to do in just a few sentences.

engineer_comp_geek has the answer. set binary in FTP just tells it to transfer the file without changing anything.

It has to do with the “End-Of-File” character (EOF).

Text transmissions in ASCII reserved the Control Character ctrl-z (code 26) to indicate EOF. DOS and FTP would copy/transmit until they reached an EOF. See http://en.wikipedia.org/wiki/End-of-file

Why a DOS copy respected EOF when it had directory info showing the true size of the file… no idea.

So text files can be rewritten efficiently.
eg for redirecting a commands output into a file.

I may be wrong on this but I think there is another difference between ASCII and Binary; ASCII files are interpreted in groups of 8-bits, but binary files not necessarily so. For example I assume an executable file from a 32- or 64-bit operating system would be a binary file with the ‘meaning’ interpreted on 32- or 64-bit boundaries.

(engineer_… appears to refer to this but I’m not sure it was explicitly stated.)

Otherwise, yes we can fall back on “they’re both just bits”.

It’s a rather old-fashioned distinction to make, I think. In the olden days, when mainframes ruled, it was common for filesystems to be built on the assumption that files had some kind of structure beyond being just a sequence of bits or bytes or words. For example, there might be table/record-like structure built right into the files themselves. Then along came minicomputers, Unix and so on, which did away with all that and just treated files as a bunch of bits, interpretation of the contents of said files being left up to applications, such as database software that ran on top of the OS rather than being tightly integrated into it.
So anyway, in that context it made sense to determine, at a low level, what kind of file a file was before the system decided how to handle it, and a text file was one kind of file that had some intrinsic structure, such as the EOF control character that **K364 **mentions.

If the OP wants to move beyond hex editors and learn what those binary bits mean…

Install a free decompiler like Boomerang. Heres an example of what the magic binary bits represent.
http://boomerang.sourceforge.net/cando.php?hidemenu

You don’t have to know assembly to appreciate the language. It’s fun to just explore a executable and see what it does. Look for jump statements. Jumps execute new code.

I think this comes closest to being the answer (well, one of the answers) that OP is looking for. There are some vestiges of ancient computer history here.

The file directory did NOT show the actual file size in some of the earliest systems. It only showed a block count. When a file was written, a Control-Z was added at the end, and then the final entire block was written to the disk. So the Control-Z was actually meaningful and non-redundant to indicate where the file ended. This was the case in CP/M systems, the forerunner of DOS. It may have been so in the earliest DOS systems too. (I don’t know when directories began having the actual byte count.) By then, there were already a lot of programs out there that wrote that (now needless) Control-Z at the end of every file, so the convention persisted for a long long time.

I never understood why the DOS COPY command needed that binary/ascii switch though. As long as it’s copying a file from the local machine to the local machine (and not across a network or something), it could just do a blind copy of all the data, without any translation of end-of-line markers or anything. Yet, the COPY command had that switch, like, forever – long before there was networking of PC computers.

It’s possible (though I doubt it) that the Microsoft programmers in the early DOS days were looking ahead to the time that computers would be networked. Mainframe computers were networked long before PC’s were invented.

Even DOS 1.0 recorded exact file sizes. The use of Ctrl-Z was mostly so DOS programs could exchange text files with CP/M systems. (I think later CP/M versions did record an exact byte count for every file, but the earliest ones didn’t.)

One big reason was that COPY can be used to concatenate files (“COPY F1+F2 F3”). Without the /a flag, the Ctrl-Z at the end of file F1 would end up in the middle of F3, and your CP/M program wouldn’t notice the contents of F2 that followed it.

In binary mode there is no interpretation, just transferring of bytes. It doesn’t matter what the underlying architecture of the source or destination machine is, it’s just transferring bytes.

Let me make it very simple. Binary means you are copying a complete file[sup]*[/sup], even if some characters at the end might not be significant to the intended application.

ASCII means that the actual file data is analyzed to determine if some characters are significant.

So if your application is not text-oriented (ASCII), and you don’t copy a file in binary mode, you might be missing something at the end.

Conversely, if you copy a file in binary mode, but the last few bytes are meaningless to the application, you haven’t lost anything, but you have copied more than you need. A few extra, meaningless bytes may not seem like a big deal now, but at one time, every byte was precious.


  • A file in disk storage terms is not saved in exact bytes, but blocks or groups of blocks, which are multiples of some basic value, like 512 bytes, or 512K bytes. But applications may not use an entire block for their purposes. This means that disk storage typically contains a little more data than the application uses, since the storage is rounded up to the next available block.

Example: you cannot write a single byte to a disk. You will have to write the smallest block size, which might be 64K or more. But the application needs to know that only the first byte is the one we are interested in, not the entire block – the rest of the block is random junk.

Uh, no. Previous posters have this right. Yes, disk space is allocated/read/written in blocks, but that has nothing to do with the binary vs. ASCII distinction, and nothing do with “some characters at the end.”

Nope. If you are referring to DOS’s ASCII/binary distinction, I stand by my statement and 35 years of computer experience. Previous posters substantiate me; I was merely trying to provide the same info from a different angle.

Fine then. Cite?

Jeez. :rolleyes: You want me to look up a DOS manual from the 1980’s? Did you ever use a PIP or COPY command in CP/M or DOS? There was a /b switch that modified the copy function. Is that what we’re talking about?

Would you accept a screen shot of the DOS COPY command and parameters?

What? No. If file copying programs ever copy data that isn’t in the file, they’re deeply broken. If an OS ever gives you data which isn’t actually part of the file you’re accessing, it’s broken.

Some OSes are broken by design. No modern OS is.

This is true but irrelevant. The OS knows how big the file actually is down to the byte, and a non-broken OS won’t give you ‘extra’ data.