Has anyone looked for shakespeare in random digits?

spankthecrumpet · March 1, 2012, 3:40pm

Has anyone looked for shakespeare in the equivilants of monkeys on type writers? I am led to udnerstand it is likely that to be or not to be… will very likely appear in the digits of pi or e and so on.

It seems the kind of nerdy thing someone would do. Any luck?

sco3tt · March 1, 2012, 4:15pm

I checked the first few hundred digits of pi and I couldn’t even find the letter ‘t’ in there. It was just a bunch of numbers.

Seriously, though, it will depend on what character encoding you use, i.e. how you arbitrarily assign a digit or group of digits to a given letter. Even the standard ASCII character set is somewhat arbitrarily defined, so there’s nothing stopping you from finding a sufficiently distributed set of pairs of digits and declaring that ‘14’ translates to ‘t’, ‘15’ translates to ‘o’, ‘92’ is ‘b’, ‘65’ is ‘e’, etc. With that encoding, the first two words of that phrase are found right at the outset.

standingwave · March 1, 2012, 5:06pm

You will be searching a long time. How long? Well, long enough to find “meaningful” stretches of anything in a series of random numbers. The monkeys typing Shakespeare is really a rumination on the nature of infinity, i.e. it’s really, really big. How big? As big as it needs to be and a little more.

Here’s a mathematical analysis of how long you can expect to search: Classroom Resources - National Council of Teachers of Mathematics

filmore · March 1, 2012, 5:11pm

Yes, the experiment is going on right now:

Digital monkeys with typewriters recreate Shakespeare

sco3tt · March 1, 2012, 5:11pm

Here’s a little more info that demonstrates how your preferred encoding affects things. If you’ve settled on using ASCII, these are two possible representations of that text:

TOBEORNOTTOBE: 84 79 66 69 79 82 78 79 84 84 79 66 69
tobeornottobe: 116 111 98 101 111 114 110 111 116 116 111 98 101

You could also use mixed case, include spaces and punctuation, etc. to come up with many more possible strings of digits. However, due to the nature of the ASCII character set, all of the letters are going to fall in the range of 65-90 (uppercase) or 97-122 (lowercase). Therefore, as these samples demonstrate, any lowercase strings will contain a preponderance of sixes, sevens and eights, and uppercase strings will be cluttered with ones. The higher frequency of these digits reduces the likelihood of finding the strings in a pseudo-random sequence (but you can never be sure).

The examples above are expressed in base 10, but you could substitute another base to skew the distribution of digits to your liking.

Again, any numeric representation of text is arbitrary. There’s nothing special about ASCII, it just happened to fit the needs of the computer and teletype designers who created it more than half a century ago. There are plenty of other encoding schemes you could use as well. It would be every bit as valid for you to create your own encoding (as in my previous example) that fits the available digits. It worked for the Bible Code.

Once you’ve determined what strings of digits will satisfy your criteria, here’s a search engine that will attempt to locate them in the first 200 million digits of pi. For what it’s worth, I searched for my phone number (10 digits) and it wasn’t there.

sqweels · March 2, 2012, 3:42am

People get the wrong idea about this point. It’s not like, “get enough monkeys together and they’ll type out Shakespeare. It’s an amazing parlor trick. Try it.”

The point is to get people thinking about how big infinity is. “Imagine how many keystrokes it would take to type out Shakespeare by randomly punching digits. That’s how big infinity is”.

tellyworth · March 2, 2012, 3:45am

Infinity is quite a bit bigger than that.

gazpacho · March 2, 2012, 3:55am

Infinitely bigger perhaps?

Digital_is_the_new_Analog · March 2, 2012, 4:15am

Maybe Aleph[sub]1[/sub] is..
-D/a

Senegoid · March 2, 2012, 7:22am

Now that this thread has segued into a discussion of infinity, lemme throw in a little personal editorial comment:

I think math students at, say, the Algebra I level, should get a little more instruction on thinking about infinity. The usual instruction (at least when I was in school) is simply along the lines of:
“Infinity is NOT a number. Repeat a large (but finite) number of times: INFINITY IS NOT A NUMBER!!!”

So we basically got any thinking about infinity forcibly drummed out of our heads, except for the notion that “sets” can be of infinite “size”, but even that is vague for a lot of students because we got infinity drummed out of our heads.

I always felt there should be a more comprehensive discussion of “infinity” at the introductory Algebra level. Assuming, of course, that you have teachers who know enough to teach it right.

ETA: I looked for “2 be or not 2 be” in pi. I found some 2’s.

psychonaut · March 2, 2012, 7:47am

No it isn’t; it ended on 6 October 2011. And anyway, that “experiment” wasn’t really along the lines of the original proposal; instead of inspecting a purely random stream of data for one of Shakespeare’s plays in its entirety, it generates random data nine letters at a time, and if that sequence matches something from a play, it is kept. The process is repeated until all nine-letter sequences occurring in Shakespeare are accounted for. This is a much, much easier and faster way of recreating the works of Shakespeare.

Entheogen · March 2, 2012, 7:55am

2 25 1417 131419 2 25 dosen’t fit, =( and 2251417131 doesn’t show up either…

Neither does 7131419225.

CalMeacham · March 2, 2012, 12:53pm

It’s not exactly what the OP is looking for, but have a look at W.R. Bennett’s interesting article “How Artificial is Inteligence?” in American Scientist, 65 (6) pp. 694-702 Nov-Dec 1977. He used a random number generator along with probability matrices for individual letters, then with pairs of letters, then sets of three letters, and so on. With “fourth order” virtual monkeys he was producing strings of words. If the probabilities were drawn from unfamiliar foreign languages, you could easily fooled into thinking the gibberish was a real example of that language.

Musicat · March 2, 2012, 1:04pm

Monkeys on typewriters wouldn’t be typing ASCII numbers, but letters, so the assignments ASCII uses aren’t a factor, and there wouldn’t be any more 9s or 1s for that reason.

To store the monkey typing as computer values, you could make up any number scheme you want, but if you are testing the distribution for randomness, you don’t use the numbers, you use what they represent. To do otherwise is like taking a highly-compressed JPG image and counting how many blue dots there are. Many blue dots will be due to compression artifacts, not the original source.

Likewise, counting how many ASCII 9’s there are isn’t testing the distribution of A’s and B’s.

Mijin · March 2, 2012, 1:57pm

Out of curiousity, I decided to write a program to do an ascii encoding of the digits of pi. Obv I wasn’t expecting anything, just curious to see it.

First, converting every pair of digits to an ascii character, and using only the characters after the decimal point:

♫☼\A#YO &.→+& O2∟T:bangbang:GE’]K :¶a1,;↨Q@cV∟:heart:0↓"§◄:spades:OR♫PV3 R▲B/ &,<_2:▬▼H5 (Q∟0♂◄-T ←:relaxed:]U§

You have “YO” and “OR” right there in the first line!

Alternatively, including the “3” in the encoding, it begins with:

∟↨:spades:@F]T. 7:clubs:R↨◄↓#♀TQ♂J2∟)F:bangbang:&4♂:clubs:7`,>↔0_1▲&@*XFCb§A a8A]".♀TK@R!NCS►4G¶ ♫808E

Not so profound this time.

psychonaut · March 2, 2012, 2:11pm

There must be something wrong with your program (or its specification); the majority of those symbols are not ASCII characters.

santorum · March 2, 2012, 2:14pm

A decent rebuttal of his work was presented on language log =Language Log » A few million monkeys (yawn)

[QUOTE= Geoffrey K. Pullum]
The number of 9-letter sequences over the alphabetic characters a to z is 5,429,503,678,976 (and as that figure of 5.5 trillion is being mentioned in the press stories, it looks like he’s ignoring spaces, punctuation, case, fonts, paragraph breaks, etc., but what the heck, let’s pretend Shakespeare’s work is a bunch of strings over {a b c d e f g h i j k l m n o p q r s t u v w x y z}). There are a few scientific curlicues in the way Anderson does things, but basically he just takes random 9-grams and does a fixed-string search over the Shakespearean corpus to see if he has 9 more letters he can mark off as done.
[/QUOTE]

Mijin · March 2, 2012, 3:51pm

Every two digits are used to create an ASCII value. That means possible ascii values 0-99.

However, ASCII codes 00 to 31 are non-printing. For these values I used the IBM PC extended ASCII set (which is the default on Windows PCs). I saw this as preferable to printing nothing, or spaces.

pulykamell · March 2, 2012, 4:40pm

Fixed your link.

Yeah, I briefly read the original article upthread, and what Jesse Anderson is doing can only very, very charitably be described as “a million monkeys at a million typewriters banging out the works of Shakespeare.” It’s not at all in the spirit of the thought experiment (or whatever you want to call it.)

iiandyiiii · March 2, 2012, 4:47pm

I’m not a coding expert, but how about every two digits represents a letter? 01-26 is A through Z, then 27-52 is A through Z, then 53-78 is A-Z, and 79-00 is A-V. That means you have a slightly lesser chance of seeing W-Z, but I would think it would make it easier to find words then using all of ASCII.

Can anyone try this, and see how soon we get “tobeornottobe”, or some other interesting phrase?

Topic		Replies	Views
Why hasn't anyone run the monkeys-typing-Shakespeare program through a supercomputer? Factual Questions	75	4151	October 15, 2005
Millions of Monkeys - a Q of chance Factual Questions	56	2649	June 10, 2004
Monkeys and the Complete Works of Shakespeare Cafe Society	41	1986	May 11, 2003
Monkeys, typewriters Miscellaneous and Personal Stuff I Must Share	32	5015	May 17, 2012
"Messages" (not really) in Pi Factual Questions	75	4068	November 19, 2022

Has anyone looked for shakespeare in random digits?

Related topics