Mathmen: Calculate the odds...(RE Bible Code)

This is a purely mathematical question, and there should be a factual answer in the form of a formula.

First, the background. I often hear that “the odds of finding [that phrase] in the bible, using ELS, are too great to be by chance.” Obviously that is an argument from personal incredulity, but I got to thinking, just what are the odds of finding something in the bible using ELS?

Let’s set the stage with these requirements:

[ol][li]Specify the language like Hebrew or English and the version, just so we are all on the same page(s)[]Specify if all spaces and/or punctuations are to be removed or not[]Allow a starting point for the first character of the search string to be anywhere in the text[]Allow ELS (Equidistant Letter Spacing, or skip) to be any whole number, and positive or negative[]The text/search string can be of any length, so that must factor into the equation[]Ignore case everywhere[]The search string will be specified in advance, so we aren’t fishing for matches of unspecified characters.[/ol][/li]We will also ignore the actual characters we are searching for, which might bias the outcome. Example: the likelihood of finding “moses” in English text is probably greater than “XQW9”, but let’s ignore that for now. I suspect the formula might have to consider a limited number of possible characters in the search string (like A…Z, 0…9)

We must also assume, for this exercise, that the text block (the “bible”) is of random characters, but of fixed length, once the above provisions are considered. So I want to know what the odds of finding a text string X in large text block Y.

There must be a mathematical formula we can use to calculate the odds of that. What is it?

And for extra credit, what are the odds of finding TWO specified text strings within Z distance of each other? Example: If I find “TWIN” starting at character M using skip N, what are the odds of finding “TOWERS” within character range P-Q?

No answer, but I swear I’ve seen somebody do the Bible code ELS stuff with famous works of fiction, and found much the same sort of “amazing” coincidences. Like with “Moby Dick” or something similar. Also, isn’t Hebrew quite often (or usually) written without the vowels, making this sort of exercise that much easier?

edit: Well, here we go with Moby Dick.

edit2: Ah, and the same mention of vowelless-Hebrew is included there.

That’s the problem right there. For any particular string, the chances might be pretty low, but what you’re really interested in is the chance of that string or any other string of at least equal significance, and there are a heck of a lot of such strings.

“You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won’t believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!” -Richard Feynman

The only people who say this are illiterates who also believe idiotic things like creationism, homeopathy, the moon landing was a hoax, and aliens caused 9/11. No slightly mathematically literate person has ever said any such thing in the history of the universe.

You really, really need to stop listening to such people.

I don’t know from ELS, but this is a bad assumption, though I understand why you’re doing it. Random text and meaningful text - i.e. made up of real words in a real language - are very different animals with regards to things like letter frequency, and it’s going to skew the results.

I wouldn’t call them very different animals. More like an animal and a rock.

The people who believe in a Bible code are usually quite eager to share their math. Go check out Google.

The problem (as other posters have pointed out) is that the math may be accurate in explaining the odds of that particular instance happening, but do not reflect the odds that any interesting thing might happen. The odds of getting anything interesting are essentially 100% - the human brain is too good at finding patterns to not find one.

Wow! 7 replies already and most of them missed the point!

Maybe it’s my fault and I didn’t explain it adequately. rereading the OP Nope, that’s pretty clear.

I think the claims by Drosnin et al are ridiculous and I am well aware that similar “predictions” have been found in War and Peace, Moby Dick, whatever. Is there anything in my OP that suggests that I am falling for that crap?

I am just asking for a mathematical formula, which I think could be derived, that would tell us the probability of finding a specific word or phrase in a specific document.

No, not any string, but a specific string, given in advance, so we can say something like “the odds of finding the phrase ‘big and strong’ are X:Y”. I’m well aware that if you allow someone to match every word in a dictionary with a large text using ELS, you are bound to find many matches. That’s not the question.

I’m also well aware, indistinguishable, that the odds of a particular poker hand coming up has nothing to do with the fact that one was just dealt. That’s not the question, either.

Jeez. All I am asking is what are the odds of finding a particular word or phrase in a text block of fixed length and random characters, using all possible ELS spacings?

It doesn’t have to be the bible, which isn’t random text anyway; I was just using that as a possible text length.

Where am I going with this? I often hear people (like Gary Schwartz) say that “the odds of {something paranormal} happening are 4 billion to one, therefore it cannot be due to chance.” I would like to refute that with good mathematics. I suspect Gary’s math is often bogus. And, dracoi, I don’t much trust the bible coders’ math, either, hence the OP. And I’m not looking for any interesting match, but a particular match. Got that?

Or, to narrow it down, I would like to apply our yet-to-be-derived formula to various books and make a chart of the odds of finding a particular, pre-determined phrase in each, then compare them. I’m pretty sure that a longer work would have greater odds of finding anything, but how much greater?

Any math whiz care to tackle this one?

I would assume the probability would vary based on the string you select and the relative probabilities of the individual characters in the English language. So, say, a string like “QUIZZES” (with the relatively rare Q and Zs) may be more difficult to find than something like “STREETS” (which all has very common letters.)

No answer, but a start for pondering the question.

Let’s start with these numbers. These are the frequency distributions, percentage wise, of letters in the English language.

So, if you pick a letter at random from an English text, there is a 0.095% chance you’ll happen upon a “Q”, and a 12.702% chance you’ll happen upon an “E.”

That occurred to me, but maybe we can treat that as merely a complication. For starters, let’s assume that no letter in the large block has any particular weight or frequency other than randomness, and the same for the search string.

If we can derive a formula for that, perhaps we can enhance it with respect to language. Or maybe that’s impossible, but the Drosnin crowd appears to have made calculations, so I’d like to try with a more unbiased starting point.

Calculation of “what are the odds of finding string X in text block Y” would be simple if ELS were a value of +1 only (like a substring:string comparision in most computer languages). It gets complicated when the skips are allowed in both directions, and more complicated when the starting point can be specified as something other than the first letter of the block. That’s where my meager math skills fall down.

See my previous post; that seems like an extra complication. Let’s start as easy as possible and work up to it.

Ignoring frequencies for now, let’s assume we are using the english alphabet A…Z and ignore numbers and spaces (the text has been compacted). Then the odds of letter N from the search string (which could have 1 of 26 values) matching letter M from the block (also could have 1 of 26 values) would be 1 / (26 * 26), right?

Mathmen???

Which version of the Bible do you use to test this theory?

We can make you an honorary man if you wish, but you have to contribute first. :smiley:

What theory? What test? Did you read the OP?

My math and stats are rusty, so please anyone correct my logic here:

So the chances of you finding “QUIZZES” exactly in that order by picking seven random letters is 1 in 1.2 * 10^12. “STREETS” will be 1 in 31,533,924. You can divide that in half if you include words that are spelled backwards.

To simplify the odds, I don’t think there should be much difference between picking letters at random and picking 7 letters that are spaced 1903 letters apart. So I believe what we would need to do is divide the above number with the amount of possible spacings of a 7 letter word in a text of length n.

n = text size
t = target word size
s = skip spacing (a value of 1 means consecutive letters)

possibilities = n - ts + s

So, if my math and logic are sound, this equation gives us the amount of possible combinations of letters, with spacing s, of size t, in a text of n characters.

Therefore, to figure out the divisor, we would have to have to sum up possibilities with a spacing of 1 to possibilities with a spacing value that doesn’t run our text over the edge of our text (and thus get a zero or negative value when we plug in our equation.)

Given your parameters, the odds of it happening are pretty close to 1, given a sufficiently large block of text. The text just has to massaged until the correct answer pops out.

There isn’t a formula, since the terms you specify allow for almost infinite wiggle room to find the answer.

No, it would be 26/(26 * 26) = 1/26 (in a probability distribution where the letters of the search string and of the textblock are uniformly and independently distributed). The probability that they’re both ‘A’, for example, is 1/(26 * 26), the probability that they’re both ‘B’ is 1/(26 * 26), etc., and you add these all up.

So, since we’re starting with a given character, say, “A”, the odds of the first character of the large text block also being “A” are 1/26.

Using a skip of +1, the odds of the second character in the search string matching the second character of the block are also 1/26. Then the probability of both 1st & 2nd chars matching would be (1/26) * (1/26), right? And so forth…

That’s just for an ELS of +1. Next we tackle and ELS of +2, +3, -1, -2, etc. and as long as we don’t run out of letters in the block, the odds of each of those are identical, right?

And whatever odds we calculate for one skip will ADD to the odds of the others, right? (because we have increased our chances of finding a match by increasing the number of comparisions).

I suspect the grand total result may approach 1, which means that the odds of finding almost any (short) string in almost any (long) block are 1:1. If so, that pretty much shoots the theory that the bible is “special” and has hidden meanings, eh? (which is what Tapioca Dextrin said.)

pulykamell, I’m going to have to mull over your post before I reply, but I think you’ve got something there, and my math is rustier than yours, fersure.