I don’t know where to look for the true processes for converting text into binary code, but I’ve learned a useable form of it from a Crichton novel. Here’s what I do:
[list=A]
[li]make a quick list like this: 1,2,4,8,16[/li]
[li]list the alphabet, and list numbers, up to 26, under the letters, giving each letter its respective number[/li]
[li]to write (example) the letter “E”, which is number 5, look to the original list, and add digits from left to right until you have “5” or more, and make a “1” for each number used and a “0” for each one unnecessary. So the number “5” would read “101”, or (1),(omit 2), (4). The “1” and the “4” together equal “5”. 5 is E. E is “101”.[/li][/list=A]
With this knowledge, I can make any letter of the alphabet into a unique pattern of ones and zeroes. But the part I don’t know is how to create a word or a sentence without using a “.” or a space to separate letters. Since the binary I see is always 0101010100110111010110110101, and I know that’s not one letter, how does it work?
okay, explain how E is “00101” or “10100”, and not “101”. By my system, “00101” would be “t”. I understand the period and space idea now, but is that how it’s supposed to be done, make space 28?
You need to know what text encoding system the data you’re looking at uses. Take the common case of ASCII. This specifies that the binary sequence representing the number 65 is the letter ‘A’, etc. It also specifies that each character is one byte, meaning it’s 8 bits long. So if you see a string like this:
01010101001101110101101101011110
you can break it up like this:
01010101 00110111 01011011 01011110
which can also be written as
85 55 91 93
which, if interpreted as ASCII, says:
U7[^
Which is very interesting indeed.
Also, your initial description of binary encoding makes it look like you have the order reversed. The number 5 is represented as “101” because it’s a 4, plus no 2, plus a 1, not because it’s a 1, plus no 2, plus a 4. For numbers which aren’t symmetrical, it makes a difference. So 6 would be 110, not 011. (there is the issue of big-endian versus little-endian, which refers to how some computers store the numbers internally, but it’s more complicated than just reversing the bit order, and is better left unexplored if you just want to get the concept. Personally, I’d rather leave it unexplored in my job, too, but sometimes I have to work on architectures where someone thought putting the bits in the wrong order was a great idea.) As to whether or not you include the leading zeros, it’s up to you. You don’t generally see 35 written as 0000035, so when people deal with binary numbers in 8/16/32/whatever bit chunks, they’ll generally write “101” instead of “00000101”, even though the latter is what’s meant.
There are many codes usedby computers, ascii (in its various forms) being the most common in PCs. You can find ascii codes here and here. Other codes are ebcdic, baudot, unicode, etc. You con easily find them if you do a search.
“E” = “00101” is not ASCII. For starters, ASCII uses 7 bits per letter, not five. For second, ASCII represents “E” as 69 decimal/45 hex/1001001 binary. See http://www.jimprice.com/jim-asc.htm
Most modern computers use 8 bits per letter(/character).
I think the reason douglips says that “E” = “00101” instead of “101” is that in a long string of bits, it might be hard to know if some randomly encountered “101” was the beginning of “101XX” or the end of “00101” (= “E”) or the middle of something else. That is, if you drop leading zeros, it gets hard to know where one letter starts and another begins.
I think the lesson here, if there is one, is that if you’re using this kind of code, in order to decode a long string of bits, one needs to know A) how many bits there are per character and B) what bit-pattern corresponsed to which character. Then you just chop the string into N-bit blocks, and decode each block. After the message is decoded, you can probably guess where the word boundaries go just by reading the letters. Otherwise do what douglips said and allocate a special character for “space.” 5 bits can represent as many as 32 distinct things, and the alphabet
is only 26 characters long. So you’ve got several spares to play with.
-Ben
Here’s an ASCII table: http://www.control.co.kr/dic/ascii.htm
The Hex representation is much more popular than the Binary. Each binary number in an ASCII sequence is 8 bits long.