How does Binary Code work?

I was trying to illustrate the point that the on and off state is a difference in current flow into a semiconductor of some sort, be it a BJT or an FET or whatever. I wasn’t trying to imply that digital logic uses TTL these days. I figured that most people who tinkered at all with electronics would be more familiar with a BJT than an FET.

I don’t design CPUs for a living but I was under the impression that the internal registers for the various pipeline stages were still capacitors.

Isn’t Michael Crichton the author who referred to ASCII codes as “Askey” codes all through the book Sphere? I could understand if the character in the book was supposed to be a layman who had overheard this term and didn’t know it was an abbreviation, but I think the character in question was supposed to be a signals expert.

engineer_comp_geek

Logic state is CMOS processes is held by voltages. The intentional flow of current only happens during the change of state. If you want to bring dynamic logic into the picture the case is still the same that the state is held by voltage. The lower bound of frequency operation comes about because of undesired leakage of charge from the capacitors. In short in modern digital chips logic state is determined by voltage not current flow.

Yes, and it is determined that 13 represents “M” because “M” is the thirteenth letter. And 1011 is just 13 in binary (backwards, but still).

Yes, it is binary.

MOSFETs are a lot like capacitors, in that once the gate is charged, it retains the charge and keeps the transistor on without requiring further current flow.

But it’s really a mistake to regard even TTL as a current level device. To determine the state of say, a bus, you measure its voltage. You need have no knowledge whatsovever of whether the bus is being driven by TTL or CMOS or relays or anything else.

There are other logic types, like the 4/20 mA current loop, which actually does use current to determine its logic level.

I think there’s one thing asked in the OP that hasn’t been answered… Firstly, data movement and processing happens in multiples of 8 bits. To try to have a computer work with a variable length alphabet on a low level would introduce way more hassles in this area than it would help in saving storage.

To be really simple, huffman encoding is what you want, because in your alphabet, the sequence “COG” is equal to 111111111. This is also the sequence for “GOC”, “OCG”, and “GAAAAAA”, along with many others. Huffman encoding avoids this, while still saving bits. The popular PKZIP compression uses this, as do most compression algorithms.

Attempting a full explanation - computers currently work using binary logic, where “signals” (or “voltages” or “currents” - you pick the terminology, as it detracts from the overall explanation) are either on or off. (It’s entirely possible to use trinary logic; I read an article within the past two years advocating a switch to a three-valued logic system because it’s a more space efficient encoding, but I can’t find the reference. It was an analysis using Shannon’s information theory, but I digress.)

Now, one bit doesn’t tell you all that much. Either on or off. To represent a greater range of values, we can group bits together. So, using 2 bits as a unit, we can represent 4 values - 00, 01, 10, and 11. Further increasing the number of bits in a “group” allows the expression of an even larger range. For instance, the ASCII code uses an 8-bit representation, which yields 256 possible “values”. As far as representations go, dealling with strings of 1s and 0s is not very human-friendly, so we tend to use other representations such as octal (groups of 3 bits, e.g. 000 = 0, 001 = 1, 010 = 2, 011 = 3, etc. up to 111 = 8) or hexadecimal (groups of 4 bits, e.g., 0000 = 0, …, 0101 = 9, 0110 = A, etc., up to 1111 = F) for more values.

Octal and hexadecimal are nice because they continue to use powers of 2, which allows representations with a wider range of values to be easily converted back to the basic binary representation. Now, in a computer, wires are necessary to carry this information, which is called the “bus”. At the risk of oversimplifying or misstating the rationale, I’ll just say that for a general purpose computing device, it’s easier, more flexible, and more efficient to process information in groups of wires that subscribe to the power of 2 format (e.g., the Intel 4044 used 4 bits, the 8088 used 8 bits, the current crop of Intel compatible processors use 32 bits, and some current processors use 64 bits; 64 bit personal computer processors are on the horizon).

The point of the above is that for “general purposes”, setting the number of bits is a tradeoff. Custom hardware would use the minimum number of bits necessary to fully represent the information required. So, if one were to make a circuit for representing decimal numbers, it would have to use 4 wires. Of course, this would “waste” some of the possible representations; the bit required to represent anything beyond 7 would only be used for representing 8, 9, and 10. But, if we’re not sure what data we’re going to have to deal with before hand, we just supply a standard number of bits and use it as we see fit. There’s a tradeoff between space efficiency and the flexibility of the system.

Now, your question was about efficiency of representation. As others have pointed out, Huffman encoding can be used to determine an efficient representation of an arbitrary range of values such that any group of individual values is unique - as Shalmanese points out, it is important to have a string of values be unambiguous.

However, the most efficient encoding of information is dependent on what is being represented. In the representation you present, the codes are assigned in alphabetical order. This isn’t the most efficient representation for english, since the probability of the letter “e” occurring is highest in a given text and you want to use the shortest representation for the letter that occurs most often. It’s best (that is, most efficient) to represent the english language according to occurance distributions. (Of course, any particular text might show a different distribution - so it’s most efficient representation would use a different encoding.)

LZW compression (explanation at http://dogma.net/markn/articles/lzw/lzw.htm) makes use of the above fact by creating an encoding dynamically based on the particular data to be encoded. A truly nifty algorithm, if you don’t mind me saying so.

Hope that helps with understanding binary and some issues involved in discussing “efficiency”…

Kramer