Randi/ where are the "hits"?

I’m specifically putting this in GQ because there should be simple, linkable, factual responses that should not require reopening old wounds.

First, some background info.

I have recently become interested in dowsing: the topic, not the practice.

I have not been able to find anyone to demonstrate their abilities to me personally, so I have looked for dowsing videos, mainly on youtube. Many are related to the well-known Randi.

On a different note, I have recently had a conceptual conversation with some math/ statistics types regarding the ideas of “better than chance” and “statistically significant.”

As I understand it, if I were to truly flip a true coin and you called heads and it was heads, this would not be better than chance. If I then truly flipped a true coin and you called tails and it was tails, this would be better than chance, but not statistically significant.

Switch the test to a 10 box test with one Easter Egg. I would have a one-in-ten chance of finding the Easter egg on my first try. Repeat the test with ten, one hundred or one thousand people and someone, sooner of later, will hit on their first try. I’m aware that that doesn’t mean they have some hidden skill.

More to the point, wouldn’t it be statistically significant if one thousand people had a one-in-ten chance of finding the Easter egg on their first try and none did?

Here’s the main question. Where are the hits in the Randi videos?

I readily admit that I’m also not finding any dowsers making videos of their failures, but there must be some tests where — just by chance alone — the dowser finds the water (or zinc or gold or iron ore or whatever) on the first try… right? I mean, people win the lottery.

Are the hits just edited out?

Are there videos with “lucky” hits being retested?

You need a basic course in statistical significance.

For most matters, it simply doesn’t matter whether or not somebody makes a hit on the first try or not. The only test that is meaningful is whether over the course of a sufficiently large series of tries, the number of hits is significantly higher than chance. It’s that number that never turns up in fair chances, no matter how much the woo-woos wish it to be so.

Your argument by negativity won’t wash either. It’s a standard riposte that’s been tried for decades. The unfortunate fact is that if dowsing doesn’t work, the proper expectation is that there is a zero chance of positive successes. And that is what is seen compared to chance.

That may be so , but:

In my Easter egg example I didn’t say anything about any woo-wooishness. If there are ten boxes and one egg and I choose one box, there is a one-in-ten chance that I will find the egg in one try. (On subsequent tries, my odds improve as I eliminate the possible boxes.)

I’m not claiming that people win the lottery through magical purposes either, just that it happens.

Even a blind squirrel finds a nut once in a while, right?
Also, to clarify — I’m not talking about people successfully dowsing, just finding the right “answer” purely by chance.

Note the final question from the OP.

When you are determining the statistical significance of a sample event (such as the results from a dowsing experiment) you have to compute the probability that the experimental outcome had occurred by chance. This is called the p-value. If the p-value is very small (in science this is usually smaller than 1% or 5%), then you can conclude statistical significance. For example in your 10 box test with the hidden Easter egg, it’s not when you find the egg, it’s how many times you find the egg after many trials of the test. The expected relative frequency is 10%. If you did the experiment 100 times and found the egg in half the trials (50%), that would be statistically significant since the probability of that having occurred by chance would be extremely small. If you’re curious, it’s essentially zero.

I am curious and I accept that it is practically zero. (I will also read up on p-value.)

I also recognize that — in my coin tossing example — doing better than chance in only two tests does not indicate anything special. However, it is a success rate that is better than chance. This is not surprising nor statistically significant.

But 100% is better than 50%, it’s just that you haven’t done enough tests. (FWIW — I am also curious as to how one determines what number of tests is enough…)

My question is, where do we see those retests… the cases where where someone, by chance, hits on the first try?

In this case you know that a) there is a positive chance of success and b) what that chance is. In the case of dowsing you a) do not know whether there is a positive chance of success and b) have no idea of what the chances of success might be should any positive event occurs. Therefore your two examples have nothing to do with one another.

As for your other question, if you enter dowsing rods into Google video you come across James Randi in Australia. Try viewing it.

I’m still puzzled by your question. A video of a hit on the first try has no more meaning than a video of a coin flip being correct on the first try. Only a series of events has any meaning.

This is basic statistics. You need this to understand anything else in the field.

A test of dousing isn’t just “Here is a field with ten possible locations for water. Tell me which one it is.”. It’s more like “Here are a dozen fields, each of which has ten possible locations for water. Tell me which one has the water in each of the fields.”. Randi, of course, knows this, so he sets up his tests such that it’s essentially impossible to “pass” the test by pure luck.

It should be noted that ground water is everywhere, all you need to do is dig. The key to digging a well is finding a place where the water seeps through the rocks fast enough that you can take X gallons out of the ground and not run out. There are no underground lakes or rivers, there are only droplets of water found in cracks and pores in the rocks. If dowsing were possible, they’d have 100% positive readings.

I’m trying to understand your question. Are you saying that if 100 identical tests were done, in which about 10 would be expected to find the object on Try #1, is such a result statistically significant?

If so, the answer is no. You cannot cherry-pick the results you want to prove a point. This is what advertisers do sometimes…“9 out of 10 tasters prefer Goochies!”

Does that mean they did 1000 tests and 900 preferred Goochies? Maybe not. All they have to do is 100 tests of 10 tries each, then discard all sets (most of them, probably) where the results were below what they wanted. No one can accuse them of fraud – their statement is the truth, just not the whole truth.

What I think he’s saying is that he watched a bunch of videos of Randi testing dowsers and did not see any where the dowser succeeded on the first try. He figures that it would have to happen occasionally just by chance, so he’s suspicious of the videos.

There’s no way to address this without at least knowing how many videos he’s seen and what percentage of the total number of tests they represent.

I think if we’re going to treat this as a real GQ topic, we’re going to need links to the videos in question so people can know what we’re talking about.

There’s also the question of selection bias for videos on YouTube. Failures are more entertaining and are thus more likely to be posted than the inevitable random success.

I don’t doubt that there are films of successful hits that are not released for the general amusement of Internet video viewers.

SiXSwordS, reading about statistics can be boring, full of dry discussions of P values and N values and lots of numbers, but there is one book I highly recommend that is fun to read, not long, and will teach you more than you need to know unless you plan on making it your life’s work.

It is by Darrell Huff, How To Lie With Statistics. It’s cheap and readily available since 1954.

In Huff’s chapter called The little figures that are not there, he writes

Even if I haven’t understood your question correctly, I think you will enjoy this book and find it educational.

On preview, I see WF Tomba said

Quite right. If you are accusing Randi or anyone else of cherry-picking the tests to present, you’d have to take all (or a random cross-section) of the videos of the tests and start to statisticulate. By pure chance, if all were 1/10 type tests, the first try should be successful 10% of the time, the second, 10% of the time, the third, 10%, and so forth.

The problem you may run into is of too few samples of identical tests, which relates to the first part of this post. Unless you can find a large number of identical tests, chance distribution (remember the 8 heads in 10 tosses?) won’t work well.

And it’s entirely possible that first-try-success tests have not been posted beccause they look too good. If so, while you might say it’s misleading, it doesn’t change the overall outcome, which shows a total failure to obtain consistent, non-random results for dowsing tests.

You might go to Vegas, put a roulette bet on lucky 29, and win big the first time. Does that prove the wheel is biased?

But, it is important to emphasize one point that is often lost in interpretation of the p-value. The p-value is the conditional probability that the data would turn out the way it was observed to, conditioning on the assumption that there was only chance involved (i.e., the p-value is P(results are [whatever] | this is a situation of pure chance)). It is NOT the probability that what went on was only chance, given the known fact that the data has turned out the way it was observed to (i.e., the p-value is NOT generally equal to P(this is a situation of pure chance | results are [whatever])). Of course, the latter is almost always what we actually want to know, but we end up using the former as a very imperfect proxy for it.

Okay, I’ll give a factual answer to the question asked. Question asked: Where are the hits.

Two examples follow describing factually what Randi does with the hits.

The very first time I saw Randi was on a TV show called James Randi Psychic Investigator. He tested various people with claimed paranormal powers, including a medium, a mind reader, and a dowser. The medium failed the test, and Randi sneered. The Mind reader failed the test, and Randi sneered. The dowser succeeded. He was given a map, divided into a number of squares He successfully found the correct square on his first attempt.

What Randi did with the hit was to start blustering, and explaining to the audience why the test doesn’t prove anything, yadda yadda yadda. Sure, he’s right that the test doesn’t prove anything. I’ll agree with that. But how come he didn’t say the same after the other tests he did? They didn’t mean anything either.

Second example is the Australian test that Exapno Mapcase mentioned above. The specific test was to find which one of 10 pipes has got water running through it. Results were as follows:

number of trials - 50
number of hits - 11
chance of hit, on each trial - 10%

That is a 22% success, vs a 10% expectation, over 50 trials.

Perhaps someone with greater knowledge of statistics can calculate the p-value for those results. I have been told, IIRC, that the odds against this happening purely by chance are over 100 to 1 against. (Most likely it is due to errors in the design and conduct of the test, but design flaws are a topic for another discussion. )

What Randi did with the hits was this: Randi said that he had also carried out two other dowsing tests. A second group of people tried dowsing for gold. A third group of people tried dowsing for brass. Randi added the results of those tests to the water test, and the final score dropped to 13%.

The water dowsers and Randi had agreed the rules of the test in advance. Each had signed a contract agreeing that the rules are fair. The agreement that Randi made does not contain any mention of gold or brass dowsing. The tests were separate, having nothing to do with each other. They were different people, making a different claim, tested under a different protocol, and with a different prize on offer. By combining the three results he broke the agreement that he had written himself, which he had previously agreed was fair.

You can see Randi’s own write up of the event, including the agreement that he signed, here:
http://www.skeptics.com.au/articles/divining.htm
That’s what Randi does with the hits. Question answered factually. I won’t tell you what my opinions are.

The p-value is approximately 0.935% (in the sense that this is P(11 or more hits) in a distribution with 50 trials with independent success probability 10%). I note this without comment, except for my previous warnings about overreliance on and misinterpretation of p-values.

It’s meaningful if the contestant said they would be successful 100% of the time. Only one “miss” is required to prove them wrong.

Likewise, one initial hit “doesn’t prove anything” if the contestant claimed 100% success in a series of trials because the second attempt might well have been a miss.

But Randi should have laid this all out before the testing began, so that he would not be seen as floundering for an excuse in the case of an initial first success.

Since squirrels find nuts largely by smell, I’m not sure how blindness is relevant.

I’m sure the sneering and blustering were completely factual and you didn’t inject any of your own interpretations.

[Moderator instructions]

Bolding mine.

By editorializing in this fashion, you are offering your opinions rather than posting factually. Given your history, I am instructing you not to hijack this thread into a Randi-bashing, and limit yourself to absolutely and strictly factual responses. Keep your opinions out of it.

While I agree with your assessment, if you have an issue with a post, please report it to the moderators.

Colibri
General Questions Moderator

First let me apologize. I took a step away from the computer in frustration after too few resposes (how ironic) and, at this point, I haven’t read all of the responses.

Some look very helpful indeed.

I recognize several subtexts that seem to be distracting from the point, not the least of which is that we’re talking about dowsing on the SDMB.

Also, I’m potentially asking a question of people who could write a book on the subject while I’m having difficulty writing a single sentance. I feel a bit as though I’m reinventing the wheel.

I guess the quoted answer above is as good an answer as I can imagine.

Considering where this is going, it might be best to leave it at that.