I don’t remember where I read this (probably Aasimov), but it’s conceivable that any given amount of information can be stored as a single mark on a rod.
For the example, we’ll use Encyclopedia Brittanica.
Here’s how:
Translate every letter in the books to a number between 01 and 26, using higher numbers for punctuation.
Now that the entire set of books is a phenomenal string of digits, put a decimal point before the first digit, making the big number into a fraction, <1.
Take your Perfect Rod, assume it’s length = 1, and make that notch exactly where it would be on the rod, in scale from 0 to 1.
Sure, to make the mark would require technology we don’t have (unless you use a really long rod), and the same to read it. But a virtual rod could do it. It would still have to store the fraction somewhere, taking up = or > space of said Encyclopedia, but the idea is attractive in a futuristic time when its use is practical.
I’ll bite and try to make this into a discussion. Suppose you could actually pull this off. It would be an incredible waste of space. You’d obviously need technology sufficiently advanced to make a notch precisely at any of a bazillion points along that stick. This means that you could encode immensely more data by putting multiple notches along the stick, using a binary code.
For example, let’s say your technology only allows you to make a notch at 100 different spots along the stick. Using your system, you can represent the integers 0 through 100 (or .00 through .99, with no greater precision than .01). But if you use a binary code, you have 100 bits, which means you can store many more than 100 unique values. You can store 2 to the 100th power unique values! So tying into your other thread about binary, if you made a mark at the last spot on the stick, plus the third-to-last spot on the stick, and nowhere else, you’d have encoded the value “5”. And if you placed a mark at every one of those 100 spots, you’d have encoded the number 1267650600228229401496703205375 (which is 2 to the 100th power minus 1). And you could store every number in between, just with different combinations of up to 100 scratches.
You also have a problem decoding the information for use. Why, you’d need a key the size of the Encyclopædia Britannica. Sounds like a zero-sum game to me.
Sorry, but the Universe pre-empted you on this one. Using the 01 to 26 substitution code you mentioned, everything that’s ever been written can be found, in chronological order, somewhere in pi. They can be found in reverse chronological order, too. Or indexed in alphabetical order according the the third letter in the authour’s middle name. Or if you use a different substitution code. And so on.
If you’re thinking about storing the position of the notch on your virtual rod as a numeric value in a computer, think again; floating point numbers are only accurate to 19 significant digits, sure you could devise an ‘extended floating point’ system, but the more significant digits you want to store (and it’s essential to store them accurately), the more bytes you need to work with.
Now that might be useful; simply search for the sequence of digits that you want in the decimal potrion of Pi, then store just the starting position and length. Should work, but the encoding phase is going to take a lonnnnng time.
Given that materials science is ever-improving, if we say ( for the sake of our thought experiment) that we are capable of producing a material rod one atom thick, consisting of atoms A and B, and that the position of the single B atom is our mark, to a rough estimation how many A atoms would we need (and therefore how long is our rod)? Ignore for now how we’d construct it/read it etc.
My question is: How much information can you encode until the precision needed gets you into quantum problems with make that notch in a precise location? Each pair of letters mean you have to be (roughly) 1,000 times as precise. I don’t know the answer but I’ll be it means you can’t encode as much as you think. Even on a very long rod.
I do not believe that this method could work. The level of precision needed would be far beyond any theoretical limits. If your perfect rod were 1 meter long, your level or precision couldn’t go beyond 10 decimal points or so, since 1x10[sup]-10[/sup]meters is approx. the size of an atom. Your level of precision will never get much tighter than that, assuming that we’re using regular matter.
Of course, if we wanted to get really clever, we could encode the information as a binary series. Perhaps we’d decide that a rod wasn’t the most useful shape / size, and that we’d coil the datastream up instead. For ease of use, we’d burn our datastream as a series of pits or no-pits on some substrate, perhaps encapsulating it in some sort of transparent plastic. We could call it a CD, maybe.
I once attempted a lecture on this subject.
The idea was to use an error-correcting code, so that every byte of date included enough check-sum to reproduce it loss-lessly. This was in when CD’s were new, and their error-correction was also mentioned.
The entire lecture was heavily shrouded in complex mathematics, and at one point it was proved that the distances between oranges stacked in an n-dimensional box approaches zero as n->infinity. (true by the way…).
After about 30 minutes of this most of the audience felt incredibly stupid, and the rest realised that he was pulling our legs. (It was the first lecture at university, and the idea was to make you think about what we were tought.)
But, back to the OP.
Of course it cannot be done!
I would believe that the best spatial resolution one can measure would be on the order of 10nm (~100atoms). (using SEM). Assume that you can position something with this accuracy, in both X and Y.
Now make a nice surface, and position your speck somewhere on a 1m[sup]2[/sup]. Total possible positions: 10[sup]8[/sup] * 10[sup]8[/sup] => ~56bits (Enough to write “hello world” using 5bit character representation, but nowhere near the EB…)
Of course, you could put several specks on your surface! If you could position (and measure) them all with the same accuracy you’d imediatelly have 10[sup]16[/sup] bits, or about 1000gB which would be quite a feat. (It’d be a chore to read or write though)
Won’t work. The number of digits required to specify the starting position will (on average) be greater than the size of the input. This is true of all non-lossy compression algorithms – they all work by making some data sets shorter at the expense of making others longer. A good compression algorithm is one whose “get-shorter” set consists of commonly occuring real world data sets, and whose “get-longer” set consists of data sets that don’t occur often in the real world.
(For example, run-length encoding compresses real-world black & white bitmaps really well since any recognizable image contains lots of areas of solid white or solid black. But if you run the algorithm on a random collection of black and white pixels it will expand it, since there aren’t many runs to encode. The algorithm exploits the fact that people are more interested in encoding recognizable pictures than they are in encoding random bits.)
Going back to your original proposal – since the digits of pi aren’t biased toward the compression of any particular set of real world data you’re probably not going to get a useful algorithm out of it. Some uninteresting data sets will be compressed really well. Some interesting data sets will be expanded immensely. Very similar data sets (differing only by a few bits) may have hugely different outputs. Not really desirable behavior.
Plus, as you point out, it would take a really, really long time to encode the data.
I first read about this in Martin Gardner’s book aha! Gotcha: Paradoxes to Puzzle and Delight. Quite a fun book. He claims that the mark would have to be enourmosly smaller than an electron, for coding an entire encyclopedia.
But what about a smaller message, like a launch code or a password? Could you encode something like that using today’s existing technologies for mark-making and rod-measuring?
I’ll bet that someone could use information theory to show that the length of the number describing the position of any particular piece of information embedded in pi is, on average, about as long as the encoded information itself. That’d mean that pi would be a lousy way to compress data; unless you got incredibly lucky.
and let each number represent a letter (giving proportional weight between one and two and digits in the teens and twenties) we get the following string of characters:
ADOIZECEHIGICWHDFZDCCHCBGI
Only a few words jump out at me: “ado”, “do”, and “hi”
To encode these, you’d need two numbers, the starting place, and the length of the string. So those three words become:
“1,3”, “2,2”, “9,2”
I don’t see you saving any space through this method.
I see the concept here. pi is an real number with an infinate number of digits. But does that really mean that we can find any sequence we want in that number? I have a hard time believing that “the catcher in the rye” is hiding somewhere in pi.
It seem more likly that while there are an infinate number of digits to pi, that certain squences never come.