The question is not could you do it, but should you do it. The program would also produce copy of all things ever written, and do the people of the world deserve all that bad copy brought back to print. Much men say and do is best left in obscurity.
Fighting breaking out the world over, because lost documents make it clear the powers that be are wrong.
Yeah, but it would also produce a copy of a lot of things that weren’t written and never happened.
Let’s say you could use only letters, no numbers or punctuation, and we don’t care about capitalization, so that’s 26 symbols. For the sake of simplicity, we’ll only calculate the number of 200-character documents. Obviously, there are going to be many more documents of other lengths, and obviously it gets worse if you add punctuation, numbers, symbols like $, et cetera.
There are 26 possibilities for each character, so the total number of possible documents is 26[sup]200[/sup]. Google Arithmetic says that’s 9.878*10[sup]282[/sup].
Who’s going to read all those documents and decide which ones are true and which ones aren’t?
If we draft a billion people to do the job, and each of them takes an average of one minute to determine if a document is true or false, it’s going to take them 1.878*10[sup]268[/sup] years to slog through all the documents. The universe, by comparison, is on the order of 10[sup]10[/sup] years old.
And 200 characters is pretty short. This site says the average two-minute speech is about 250 words long, and a nonfiction book can run to 200,000 words. Assuming each word is about 5 characters (number pulled out of my ass for purposes of calculation), that’s 1250 characters for a two-minute speech.
I had thought of the fact you wouldn’t know which was the exact copy. That it is out there, is enough for a lot of people to believe. I thought we might ponder the implications of all that being dumped on the world. In away similar to the internet, but much more encompassing in scope. The example I posted wasn’t the best. So if we could produce all this output, should we make everything public, or would it overload our society with too much crap.
I vote for “too much crap”. There are too many people out there who will believe anything they read, even if they really should know that the chance of any one randomly generated document being true is very small.
Arg. The thing is, you’d produce “lost” documents billions of times faster just by sitting at your computer and writing whatever you felt like.
The odds of getting even a few pages of intelligible text before the sun burns out are miniscule. So of course you can’t do it.
OK, lets suppose we’ve got super-duper-reverse-quantum-anti-polarity computers that can generate random text really really really fast. OK, you’ve generated a vast database of random output, as much as if the entire universe were made of supercomputers and those supercomputers had generated random text until the end of the universe.
This output is guaranteed to have intelligible text. In fact, you’ve recreated the Library of Borges. Every work that has ever been written and will ever be written is in there somewhere. Every work of Shakespeare, every work of Shakespeare except the main characters are rabbits, ones where everyone talks backward, ones where everyone talks backward except for one character, ones where everyone talks backward except two characters, etc. etc.
Now, how are you going to FIND all that? And if you do manage to find an intelligible text…never mind that you’d have to search past the end of the universe to find it…what exactly have you proved? OK, you found a copy of Julius Caesar, except in this one Julius is a woman, and there’s a typo on page 34. Does that prove Julius Caesar really was a woman, and the government was just covering it up?
Except this won’t happen because no one is going to search for millions of years through random gibberish in the hopes of finding intelligible text. You’re better off looking for messages in the patterns of clouds.
Anne Neville
“Yeah, but it would also produce a copy of a lot of things that weren’t written and never happened.”
By doing this project it would then exist. There would be some very good original material and a ton garbage. Would all the good material be worth all the totally worthless stuff inundating us.
No, of course it wouldn’t be worth it, since you’d never be able to find the worthwhile things amonst the hundreds of thousand of millions of billions of trillions of quadrillions of quintillions of sextillions of septillions of octillions of nonillions of decillions of pages of gibberish.
Imagine you had a tool like Google to search for text strings in the gibberish. Even if you had a hyper-quantum-reverse-warp-tetrion-particle Google that could search that infinity of strings instantly, you still couldn’t find worthwhile stuff.
Say you typed in the first page of Julius Caesar, so your Uber-Google will search the Library for all texts that contain the first page of Julius Caesar. OK, you’ve found those works. Except every other page of those books is gibberish! In order to narrow it down to those works where the second page isn’t gibberish, you’d have to specify what Uber-Google should search for…in other words, you’d have to input the second page of Julius Caesar and tell Uber-Google to search for (1st page) AND (second page). And then the third page.
There’s no way to find intelligble works in the Library, even if you can search it instantly! Yes, somewhere out there are heartbreaking works of staggering genius, but you can’t find them unless you already know what they are even with a search tool that breaks the laws of physics! And even if you could find a heartbreaking work of staggering genius, you could never find the definative version…if there’s one copy of the work where the protagonist is named “Steve”, there’s also a copy exactly like it except the protagonist is named Ssteve, one where he’s named Sssteve, one where he’s named Tseve, one where he’s named Stseve, in fact, there are copies of the text with every possible combination of letters substituted for the name of the protagonist. And also one with every combination of letters substituted for the name of the hero’s love interest. And millions with one typo on each page, millions of billions with two typos on each page, etc etc.
You’re hundreds of thousands of millions of billions of etc more likely to recreate “lost books” or “future books” by sitting down at a typewriter and writing it yourself than trying to find it among an infinity of random gibberish, even with a platonic ideal of a search tool to find it for you.
Suppose you hooked up your Uber-Google search engine to filter that would only return works with no typos, works that only use words found in the Oxford English dictionary.
What you’ve really done is change the random letter gibberish for a radically–hundreds of thousands of millions of billions of etc radically–smaller set of gibberish constructed only of official english words.
Imagine how much easier it would be to find works in this astronomically smaller restricted set. Astronomically? I mean astronomically^astronomically^astronomically.
Now how are you going to find those new books? Your uber-google still fails for the same reason. OK, what if you simply program your computer to recognize english grammar and syntax, and only return works that have english words that follow the rules of english? Nope, still not good enough, we’re still talking hundreds of millions of billions of searches that take as long as the age of the universe without finding anything significant.
So what if you program your computer to…well, READ the strings, not only for spelling and grammar, but also to understand them. The computer only returns works that make sense, it can search the entire library and find only those works that would make sense to a human being. Now, how exactly is that computer program different from a computer program that COMPOSES books that make sense to a human being? Of course, there would be no difference! In order to computerize searches to only return meaningful results you’d have to generate an algorithm that would recognize meaningful results, which means an algorithm for generating meaningful results, which means a computer program that can write books that people find meaningful. A computer program that finds the works for you is indentical to a computer program that writes the works for you.
The program doesn’t create that information, it just visualizes it. All that information already exists (in the sense that any number “exists”).
Without some way for it to assign a value to each piece of information, it’s useless. Sure, it’ll produce a copy of every great lost book ever written, but it’ll also turn out many great books that were never written, and a nearly-perfect copy of every great book ever written with the word “not” changed to “totally”, as well as a ton of useless gibberish. If you want to achieve anything, you have to separate the information you care about from the information you don’t - and simply listing every possible piece of information won’t help, unless you send the results off to a team of humans (or sufficiently advanced AIs) who can do that.
No, I don’t think so. It would be a mistake to trust anything “produced” by that program.
You could just as easily do the same thing with video: if you program the computer to produce every possible 200 MB file, then throw out the ones that don’t fit the format of a DivX AVI file, you’ll still be left with an insane number of perfectly valid video clips of assorted length and quality. Some of them will show a bar mitzvah on the moon being rudely interrupted by the Apollo landing. Some of them will show JFK being assassinated, then reverse time and trace the bullet back to its source, which will be Zombie Abe Lincoln holding an AK-47 and wearing a crown made of human teeth. Some of them will show you and your wife having a threesome with Rosie O’Donnell.
None of the events they describe actually happened, but they’re all out there. Even if you never run the computer program, they still exist: even today, there are many, many large numbers that will show Zombie Lincoln shooting Kennedy if you feed them into a video player, but we just don’t know what they are yet.
Reply to Anne Neville
I missed the link you had inserted. I’ve read that before and It’s great. I would suggest people visiting this post give it a read…
Reply to Squink
I checked out your link yesterday. Nice cite.
For Everybody:
I wonder if the results of a human search engine, would be mostly porn and ads.
Do you have my results for “silverware”.
Yes we went through the first days pile and here’s all the relevant text. We know what you asked for, but we threw in a bunch of porn. We knew what you were looking for. We also found lot of stuff with titles like “Monkeys that like to wear silverware.” “The monkey and the spoon of fling. “Jack the knife.” Lastly “The spork is it silverware?” Oh, and we found “Cats in flying saucers over Wisconsin. Give us your milk.”
Well, next time I’ll use “Dogpile”.
This would actually give you the text of any document, even the ones more than 200 characters long, simply by concatenating the right set of 200-character documents.
This being the case, you could actually do it with shorter documents; every possible ten-character sequence is 26[sup]10[/sup], or 141,167,095,653,376 sequences - quite manageable - and every possible work of literature is contained within them as legible fragments.
or we could take it to the extreme and generate every possible one-character sequence; all 26[sup]1[/sup] of them; and here they are:
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
every single work of literature is contained in the above library, in legible form; all that is required is that you read them in the right sequence.
I’ve often thought of starting the Shakespeare Monkey Publishing Company. Everyone concentrates on the odds of producing perfect copies of everything, but doesn’t reflect upon the fact that there are simply enormous numbers of “almost perfect” copies, in which there’s only one typo, or a handful of them, or in which a page of Y’s is interposed between two otherwise perfect pages of text. What I would do is to market all these “factory second”-type texts, the necessary byproduct of the search for the perfect book, and use the funds to pay my investots, make a profit, and fund more monkeys.
I think one big seller would be the Encyclopedia of Incoorect facts, which is a coppy of the Encyclopedia Britannica with everything wrong. Another would be the Unintentionally Obscene Books series, in which all of the character’s names are replaced by sexual puns.
Even if, in the future, we’ve got monkeys at typewriters creating all kinds of text imaginable, I think I would still write, because my brain has a tendancy to latch onto certain thoughts at the cost of others. Putting my ideas on paper helps me make space for thoughts more productive than: “I wonder what would happen if there really was a Simian Publishing Company, LTD.”
I think the point is that, even if we had the infinite library, the overwhelming majority of it would be meaningless noise; we would be far less likely to stumble across interesting or significant portions of it than we would be to just make them up from scratch in our own heads.
And by “far”, he means hundreds of thousands of millions of billions of trillions of quadrillions of times far less likely. We;re talking searching until the universe grows cold and starting over again from the beginning a couple trillion times.
You can’t find anything in random text because you can’t search it. You’ll never find any lost or future or existing text unless you already know exactly what to look for, and if you know exactly what to look for you’ve already got the text, you don’t need to search for it.
Mangetout illustrated this brilliantly by posting a complete library of all human information, past, present and future. You just have to figure out a way to search his library for information.