Help me to design an experiment

I joined a forum recently full of, frankly, credulous individuals who believe in dowsing. I was describing an experiment to determine if dowsing works by concealing a jug of water placed under one of ten randomly selected cardboard boxes. How many times would the experiment have to be repeated to rule out luck?

Thanks,
Rob

That depends on how much luck you want to rule out, and how reliable the dowser claims their ability is. Even if they claim a 100% success rate, and you run the experiment n times, there’s still one chance in 10[sup]n[/sup] that they’ll succeed.

Well you can’t absolutely rule out luck. What you can do is come up with an experiment where the likelihood of a successful outcome by random chance is small enough that everyone agrees that it indicates that some other factor besides luck may be at play.

Let’s suppose you and the dowser agree on 1 in one million - meaning you’re going to do a set of trials such that the likelihood of the dowser getting them all right by guessing is 1 in 10^6. If they can do that then they win the bet, if they fail then they lose.

With one full bottle out of ten, there’s a 0.1 chance of guessing correctly in a given trial. Run six such trials and the likelihood of someone guessing all six correctly is 1 in 10^6.

The important part is that everyone has to agree IN ADVANCE exactly what level of success has to be demonstrated and exactly what series of hits/misses constitutes this. Using my previous example, you’d all have to agree “The dowser will do exactly six trials. In each trial there will be one full container and nine empty containers. If the dowser identifies the full container in a given trial that counts as one hit. The dowser must obtain exactly six hits in six trials to succeed. Any other result is not considered a success.”

The reason for this is to avoid the fallacy of naming the target after you shoot. A simple example:

I flip a (fair) coin 20 times and get this result:

HHHTTHTHTTHHTTTTTHTH

That exact sequence has approximately (0.5)^20 chance of happening which is just about 1 in 1,000,000.

But I got it by banging on the keyboard randomly. There’s nothing amazing about it - there are 2^20 possible outcomes and that just happens to be one.

On the other hand, if I specified beforehand that I was going to use my telekinesis to make the coin come up in that EXACT pattern of heads and tails in twenty tosses and then we saw that pattern, that would be fascinating.

Is that any help? I don’t know that there’s any commonly accepted probability where experimenters agree “That’s unusual”; one in a million sounds right but if you want to be very stringent, run more tests. Do nine and it’s one a billion - I’d be pretty impressed with someone who can do that under properly controlled conditions.

Making sure that the experiment is properly controlled is the big challenge, I think. You have to prevent actual cheating as well as anything that would give the person subtle clues. I’d also make it double-blind so that not only does the testee not know what’s under each container but the person running the experiment doesn’t either.

James Randi’s book “Flim Flam!” gives details of some dowsing challenges that he’s conducted. I’m sure there are skeptics organizations online that have some suggested protocols as well but some things that Randi did, in addition to the basics of having unmarked sources of water for the dowsers to find, include having them demonstrate their ability on a plainly visible source of water and having them check the test area before any containers are brought in to confirm to their satisfaction that there’s nothing in the area that would throw them off (for example so you don’t hear “I didn’t realize there was a bathtub downstairs, that confused me during the trials”).

Unfortunately, that assumes the dowser claims 100% success.

For an infinite number of chances, the dowser should select the correct box 10% of the time (I feel pretty confident in predicting that). If the dowser gets lucky two times in 10 trials, he can say, “I have a 20% accuracy rate which is better than chance.” How many trials would you have to do in order to get the accuracy rate closer to 10%? What variance is statistically negligible?

Thanks,
Rob

BTW, I brought up that this experiment has been done and one of the posters is asking for a cite (but not from skeptic.com or badscience.net, surprise, surprise). Do you know of any? I suggested that if this individual in question felt that their results were biased or flawed, he should run his own experiment, as a good scientist should.

The problem here is that people in general are greatly affected by ‘near misses’.

If you had an audience and agreed to the above…I guarantee you that the person would get 1 or 2 hits and most everyone would considered it a ‘success’ and you a douchebag for saying it isn’t.

Back in college many people thought they were psychic. I actually held a ‘test session’ for a woman who was sure she was psychic using a deck of cards. People were interested and so I did it for others (non ever passed btw).

However, a person would guess…say ‘queen of hearts’ and the card would be the queen of diamonds and people would say…“That’s close! That should be a hit!”. Heck, even a queen of clubs coming up would provoke some response.

There eas even ‘delayed hits’ in which someone would guess the Ace of Spades and 3 cards later the Ace of Spades would appear and people would be impressed.

It was hopeless.

You also want to make sure to double blind your experiement.

Make sure that neither the dowser NOR the observer know where the water is until after the dowser has done his thing.

Otherwise you may be measuring the dowser’s ability to read your facial expressions and body language as you observe them getting close to a hit.

I had already specified double-blind conditions, but it is now a moot point. The individual I was talking to refused to conduct the experiment. He just knew it worked because he had “successfully” dowsed for water. I asked him if he were willing to bet his life on it and he stated that he would try other, more mainstream methods first. He refused to believe that dowsing might be worse than doing nothing. To be fair, it would only be worse if it causes him to loose more water than he might find, but as I understand it, it can be hard to dig and NOT find water. He also stated that it only works if you believe in it. At that point I decided to give up. I never really had any hope of convincing him, of course, but perhaps someone else will read the thread who will profit. Fighting ignorance is truly an uphill battle.

Thanks,
Rob

Do any of them actually claim such ability? Most dowsers I have encountered claim the ability to detect many tons of water flowing in underground channels. Most of them would say that it doesn’t work on a jug of still water.

This is common. A psychic demonstration is conducted under something resembling controlled conditions, and it fails to demonstrate anything unusual. The would-be psychic then claims it was the lack of belief or “negative energy” among the observers that was responsible for the failure.

Well the first thing I asked the guy is whether a dowser could find a jug of water.

Some claim to be able to use dowsing to find lost car keys. In De Re Metallica, Agricola advised its use for mining minerals. There are apparently some archaeologists who use it to discover ruins. I’d like to meet their tenure committee. Strike that. Their tenure committee would most likely want to make me tear out my eyeballs.

Rob

Yeah I think that’s the biggest problem there.

cite? People keep accusing dowsers of saying this. Can you find any actual instance of it happening?

Yeah I think that’s the biggest problem there.

eta: If a couple dowsers believe that it works for small amounts of water, then all you’d be able to prove to them is that it just doesn’t work for amounts you used and lower.

100% was just an example to make the math easy.

If you run 10 trials the chance of someone getting 2 or more successes by random guessing works out to be a little over 26% - 0.2639 if I did my math right. That’s not very exciting.

If you run 20 trials the chance of someone getting 4 or more successes (again, that’s a 20% hit rate) is 0.133. Still not bad.

Make it 50 trials and the chance of getting 20% (or more) successes by guessing drops to 0.0245.

If you go to 100 trials it’s 0.00198.

The lower they claim their success rate is - that is, the closer their “dowsing” ability is to just random guessing - the more trials you will need to do in order to feel confident that the observed outcome is not in fact random chance.

And remember, everyone has to agree in advance how many trials will be performed and what number of hits constitutes a positive outcome. I think it’s also a good idea to have agreement on what constitutes a negative outcome - the simplest example being “Not meeting the ‘positive outcome’ criteria”. If everyone stipulates that there will be 100 trials and that the testee must get at least 20 of them right in order to “win”, it should also be explicitly agreed to that getting 19 or less constitutes a “loss”.

No, this is wrong. It is irrelevant what they agree. What matters is statistical analysis of the result. If a properly qualified statistician does the ananlysis, and says that 5 hits is a success, with less than 0.1% chance of error, then that is what should be considered. What matterts is whether the dowser can get those 5 hits, or better. It doesn’t matter at all if the dowser says he can get 10 hits, but only gets 6. It’s still a success.

Dowsing must work. Otherwise this artical saying that Iraq’s security forces are dowsing for weapons at security checkpointswould be scary.

If I whack a golf ball down the green and it lands on a particular blade of grass out of the ten jillion blades of grass it is not amazing.

If I state up front that I will hit that particular blade of grass and then do so, it’s amazing.

If someone is claiming that they’ve got some kind of unusual ability and they want to participate in a scientific test of said ability it is critical that they state up front exactly what they are going to do and agree what constitutes success and what constitutes failure. Otherwise you’re allowing stuff to happen at random and then letting the testee name his target.

See my coin toss example. It’s only interesting if the person named the precise sequence in advance.

Note that nothing about this prohibits giving a range for success - in the dowsing example if you think that getting 10 hits out of 50 trials is “success” but you’ll allow 6 hits out of 50 trials as “success” also then just make that the condition up front - “Tester and testee agree that getting 6 or more hits out of 50 trials shall constitute success”.

I’ll note, btw, that 6+ hits in 50 trials has more than a 38% chance of happening at random. I would not consider that a statistically significant indication of a paranormal ability.

Utter rubbish. That’s not how it works. Maths is maths. You can’t overrule the laws of mathematics just by agreement.

This is why “positive controls” are important–in this case, using the device to be tested on a known positive without applying your experimental condition (in this case, double-blinding and a range of unknown targets).

This is useful in this kind of testing to avoid the risk being discussed–that the dowser says “it doesn’t work for small amounts of water” after the fact to justify a failed test.

The procedure ought to therefore have (some of this is duplicative:
(1) letting the dowser to say ahead of time what he can detect (defined however he wants) and (2) on the day of the test, starting by having the dowser perform a non-blind test on a target exactly like the positive target on the real test (i.e. a full jug of water under a cardboard box), where the dowser gets to have everything set up as he wants.

What this demonstrates is either that (1) the doswer can’t “detect” a known positive (in which case his ability is pretty pitiful), or (2) demonstrates that he can detect a target exactly like the one you’re using in the real test–such that the dowser cannot argue that his ability wasn’t working that day, or that he cannot detect a target like the one you’re using. Then, when the only difference between the positive control and a failed experimental test is (1) that the dowser doesn’t know where the water is, and (2) to the extent you are using controls to rule out “natural” methods of achieving supernatural results, those controls are in place, you can rule out any dowsing ability.

This is very true, and very important. What you imply, but is worth saying out loud, is that the “success” condition you pick should be developed with statisticians (so that the probability of success is appropriate).

Peter Morris’s point–that the “success” condition ought to be different from chance at a statistically significant level is valid–but, as you point out, it is trivial to determine what that level is in advance of the experiment–and as you make clear, there are very good reasons for doing so.
ETA:

Yes, but the point is if “maths is maths,” it is the same before and after the test. If calculating and defining what a statistical “success” would be before the test avoids certain risks (even if some of them are merely driven by appearances of success), there is no good reason to avoid it.

It’s a trivial calculation. No good reason to save it till afterwards.