How does Binary Code work?

After I read Michael Crichton’s Andromeda Strain, I thought I understood what binary code was, and how it was translated from letters to 1’s and 0’s. Now I know that this is not the true method, but here’s how it was done in the story:

  1. start with the “key”: 1 2 4 8 16

  2. Each letter of the alphabet can be represented by a number: A is 1, B is 2…Z is 26. Each letter can be represented by 1’s and 0’s by using the “key”, placing a “0” for numbers not used and a “1” for numbers used. Here are some examples:

G is the seventh letter in the alphabet, so G = 111, because in order to get the value of “7”, you need 1, 2, and 4. 8 and 16 are unecessary.

L is the twelvth (sp - how the hell do you spell that?) letter in the alphabet, so L = 0011, since 8 + 4 = 12 (the 1 and the 4 are not used, so they are 0’s.

Each letter has a unique code, as listed below:
A = 1
B = 01
C = 11
D = 001
E = 101
F = 011
G = 111
H = 0001
I = 1001
J = 0101
K = 1101
L = 0011
M = 1011
N = 0111
O = 1111
P = 00001
Q = 10001
R = 01001
S = 11001
T = 00101
U = 10101
V = 01101
W = 11101
X = 00011
Y = 10011
Z = 01011

Question 1: Can someone explain to me why this coding system is not considered more efficient? Besides needing some way to represent the space between values, such as a space or a ., wouldn’t this code take up less storage?

Question 2: Is there a good cite that teaches binary code?

– sorry for the sloppy post, It’s one of those muddy-minded days.

Your listing had me confused for a minute until I realized you were writing the binary representations in the reverse order from the way they’re normally written, with the ones digit on the right.
(So 110 should mean one 4 + one 2 + no 1s to be consistent with, say, 15 meaning one 10 + 5 ones in decimal notation.)

Binary, or base two, is just an alternative way of writing numbers, using the digits 0-1 and place value to represent powers of two, as opposed to the decimal system which uses the digits 0-9 and place value to represent powers of ten.

Computers do store information in binary, internally (though not with A=1, Z=26, etc.), because each bit (= binary digit) of memory only has to store one of two values, which it can do by having a teensy electronic switch be either on or off. Rather than trying to go into any more detail, I’ll wait until somebody comes along with a really good cite.

  1. Less storage than what? More efficient than what?

  2. Binary “code”? Well, binary numbers are just like you explained them, only backwards (the largest number is on the left, like in decimal). All base number systems work just like the one you are familiar with.

For example, in base ten (our decimal system) any number is represented by multiples of powers of ten. So 13, for example, is

1 x 10[sup]1[/sup] + 3 x 10[sup]0[/sup]

In base two, 13 would need to be presented in no more than on each of any powers of two. Which would be 8 + 4 + 1, in our case.
8 = 2[sup]3[/sup], 4 = 2[sup]2[/sup], and 1 = 2[sup]0[/sup] so 13 in binary is

1101

which would be the letter “M” in your list but note the order is reversed.

You can only add up multiples of powers of ten in base ten. You can only add up multiples of powers of two in base two.

The number of characters available are dependent upon the base. Base 2 has two characters: 0 and 1. Base three would have three characters: 0, 1, and 2. Base ten has ten characters, 0-9. And so on, with any base you like.

This binary code seems to just map the letter’s place in the alphabet to a number, and then writes down the number.

Well, you might prefer that the length of the code for any letter be the same as for any other letter, rather than having A be of length =1 while N is of length = 4 etc.; that way you know when one letter’s code stops and the next letter’s code begins, without having to insert another code for “break between letters” after every letter.

And while your computer might find coding in binary to be pretty transparent, you might discover that it’s mind-numbingly awkward to write out long sequences of 0s and 1s to represent each letter, and if someone (your colleague down the hall, let’s say) came up with an in-between coding formula like base-16 hexadecimal which could be read easily by you and interpreted easily by your code compiler, well, you might embrace that little tool with lots of enthusiasm.

And realistically you’d want more than the 26 letters of the alphabet. You’d want UPPER versus lower case plus space and punctuation marks and accented characters and šÅ睺˜µ and whatnot, so you and your programmer friends would probably come up with a character table akin to the one known as ASCII or the more modern Unicode. (ASCII was 128; the extended “upper ASCII” adds an additional 128 for a total of 256 but they are less standard & therefore less consistent cross-platform. Unicode is 16-bit and therefore has support for 32000-some-odd characters)

FWIW, here is a table listing the ASCII standard, which is how letters (and other characters) are represented on most computer systems. (Almost all of them, these days.) The site gives the corresponding numbers in decimal, base 16 (hex) and base 8. You can easily figure out the binary equivilents yourself.

Say you have the following binary value representing a text string using your encoding scheme:

1111

Is it an O? Or is it two Cs? Or is it GA? You don’t know. There’s not enough information there to tell you.

There are a few different ways to get useable text encoding. Extended ASCII makes every character takes up a specific amount of room, in this case 8 bits. That way your software can quickly grab each 8 bits of information, figure out which character that segment represents, and then move onto the next. It’s not space efficient, since most characaters won’t need all 8 bits, but it’s much less computationally intensive to figure out where characters start and end and to decode them.

If you want your text to take up less room, then you need to use some form of encoding that tries to optimize the amount of memory required to represent a string. There are many,many different ways of going about that. Do a google search on “huffman coding”, that’s probably a good place to start for that sort of thing. Most of these will have the opposite tradeoffs that a defined character length will have, namely they’re more computationally intensive but take up less memory.

Oh, and not only is binary usually represented the other way, zero is also a number.

huffmann encoding is sort of like your idea except that a requirement is that no code is the prefix of another code.

simply put, if you set A as 10011, then you cant have B as 100110 since 10011 is a prefix of 100110.

This resolves the 1111 ambiguity because you can always tell when the breaks are. It leads to more efficient representation since you can have common letters with shorter codes and less common letters with longer ones, thus, e might be 01 while z might be 110101101.

It’s not more efficient for us mere hu-mons because we are used to symbols representing letters and numbers. A computer can’t. At it’s basic level, a computer can only recognize “off” and “on”. An electron is either there or it isn’t. A segment of magnetic tape is charged or it is not. A circuit is closed or open. It’s more an issue of the technological limitation than “efficiency”.
Now don’t even get me started on Hexadecimal (base-16: 0123456789ABCDEF)

Less storage space than the 8 digits used in binary, since some letters need only one or two digits to be represented. But then there’s the issue of spacing, and other characters, as stated in another post.

Forgive my blankness, but I don’t understand how “1101” = 13. I understand 8 = 2[sup]3[/sup], 4 = 2[sup]2[/sup], and 1 = 2[sup]0[/sup], but how that represents 1101…

I think you’re saying that if M is the 13th letter, then M = 13. That’s not what I meant. M = 1011 because in order to add up to the number 13, you must use 1, 4, and 8, but exclude 2, making 2 a “0”.

In the codes I wrote, “1” means the number was used. “0” means it wasn’t.

The alphabet in binary code. Can someone explain to me how these 8-digit codes are figured, or post a link to something that can? I may be mathematically challenged, but I want to know.

Binary is just the base 2 numbering system.

0 = 0
1 = 1
10 = 2
11 = 3
100 = 4
101 = 5
110 = 6
111 = 7
1000 = 8
etc

Computers use binary since it’s very easy to make circuits where electricity is either flowing or not flowing to correspond to the 0 and 1 state. Of course what the number 1 means when compared to text is completely arbitrary.

You have to come up with some way of telling where the end of each character is. Computers typcially group the bits by factors of 8, so there’s no need to have any sort of special character saying where one character starts and one character stops. Using uncompressed raw data is more efficient for the processor since it doesn’t have to figure out where things start and stop. Processors have instructions that are specially designed to work on data that is 8, 16, and 32 bits in size.

As you’ve noticed, there are more efficient ways of storing data. What you posted is very close to an algorithm that Shalmanese
briefly mentioned called Huffman encoding. Rather than just go down a list like you did, what you want to do is put the more common characters like E, R, S, T, L, and N up near the top of the list. That way the most frequent characters get the shorter codes and you get a higher compression ratio. Huffman encoding isn’t used much since there are more efficient compression algorithms out there, but when I went to school it was still being taught just as a good example of data compression. I believe that the jpeg standard uses huffman encoding, at least in part of it.

Here’s a site with more details:
http://www.la.unm.edu/~mbrettin/algorithms/huffman.html

If you do a google search on huffman encoding and lempel-ziv encoding (another popular algorithm) you should end up with more than enough reading material to keep you busy for a while.

Think of it like this. In our base 10 number system, each number column is worth a number of specific powers of ten. So our one column is conceptually equivalent to a 10^0 column, as anything to the power of 0 is 1. The ten column is obviously 10^1 and the hundred is 10^2, ad infinitum. The number that you place into each column tells you how many of that specific power you need, so 123 is equal to (1 * 10^2) +( 2 * 10^1) + (3 * 10^0).

Now, we change the base of the number system we are using, from ten to two, but the rules are still the same. The first column is worth 2^0, the second 2^1 and the third 2^2 etc. As we are in base two, to make sure we cannot represent a specific number in a number of ways, we must limit the number of each specific column we have. In base ten, we could have a 0-9 in each column, in base two we can either have 0 or 1.

So 1101 is conceptually equivalent to:
(1 * 2^3) + (1 * 2^2) + (0 * 2^1) + (1 * 2^0) which is 13.

As for the efficiency aspect, if you are talking about simply writing down a number, then the higher thebase you are working in, the shorter number of numerals a number can be represented in, generally. Hexadecimal and octal are favoured over binary in programming due to them being able to concisely represent a number, and their ability to be converted between binary and decimal easily.

Well, let me see here. So… I’ll try…78, by my formula, then reverse the digits.

I get 1001110. Is that right?

That page just shows what letters map to which numbers in the ASCII character map.
There’s not really a mathematical method for determining ASCII values. It’s an agreed upon standard for mapping number values (which computers can handle) to letters and punctuation (which people can read). At some point someone arbitrarily decided which number repesented which letter.

ASCII stands for “American National Standard Code for Information Interchange”. It evolved from earlier standards and can trace its history back to the days of the telegraph, though ASCII as we know it didn’t get defined till the early 1960’s at AT&T.

There are many other ways of mapping letters to numbers. ASCII just happens to be the standard that won out over the others in the computer world. For example, older IBM mainframes used a different standard called EBCDIC (pronoucned “ebb-sid-ick”). Here’s a table that compares the two:

http://www.natural-innovations.com/computing/asciiebcdic.html

Since I won’t be here to find out if I’m right, I’ll try a big number the same way, and check back here on Monday (I’m at work).

24,987 in binary code = 110000110011011

Right?

'til monday then…

Right, CC.

BTW, the Windows calculator, in scientific view, allows you to use decimal, binary, octal and hexadecimal numbers. You can enter a number and see what it looks like in the other systems by clicking the appropriate radio buttons.

Now, to completely confuse the issue, let’s call in the Endians.

http://www.cs.umass.edu/~verts/cs32/endian.html

[nitpick]Actually, in modern computers, the values “0” and “1” are represented by voltage, not the flow of current. Otherwise, computers would consume a lot more power than they do. The small amount of current that flows in a computer chip generally occurs only when some bit changes from “0” to “1” or vice versa. [/nitpick]

Since we’re already nitpicking…

While generally we tend to think of signals in terms of voltage, the truth of the matter is that a logic gate down at the silicon level is a current operated device. Bipolar junction transistors turn “on” when current flows between the base and emitter. This causes current to flow from the collector to emitter. Someone who thinks about the currents involved insures that the voltage on the collector is close enough to 0 and 5 volts (for TTL, for example) to be useful for a digital circuit, which generally means insuring that the transistor alternates between saturation and cutoff so that you avoid the “linear” region of the transistor. Field effect transistors work a bit differently, but its the same sort of thing. Internal registers on a computer are typically small capacitors, which is why your CPU has a minimum speed as well as a maximum. Run it too slow and the capacitors leak their charge off before the data value is needed for the next stage. The capacitors are charged by current flowing into them, then when they are “read” a current flows out of them into the next semiconductor stage.

Digital designers like to think purely in terms of ones and zeros, but underneath the hood its all about current flow.

I don’t know much about binary, so I’ll leave that to the experts. But I can contribute one thing at least…TWELFTH!

ECG

What are you talking about? TTL hasn’t been used in years. Dynamic logic aside from dense memories is not used very often either. Almost all digital logic these days is CMOS logic. Not bipolar logic.