Water Witching

Uncertain · October 5, 2009, 9:03pm

That’s what he says he’ll be able to do. It tells us nothing about what results would be statistically significant.

And the statistical test that I propose would say that the result is not statistically significant. The calculated probability of doing that well or better would be 100%, which is the worst (least significant) p-value possible.

You seem to be saying that because the people are not equally likely to be alive and dead, no legitimate statistical test is possible. That’s just not so. Fisher’s exact test should do just fine.

Uncertain · October 5, 2009, 9:43pm

That wouldn’t make the test “unscientific”, just pointless. No outcome could provide evidence for the phenomenon. Announcing any composition other than “all alive” or “all dead” would allow for a meaningful test. “Exactly one of these fifty people is dead; tell me which one” is a meaningful test. Claiming that a wrong guess supports the phenomenon because it amounts to 48/50 right would be moronic. Claiming that a correct guess is evidence of the phenomenon (though not overwhelming evidence) is perfectly correct and scientific.

Informing the testee of the exact number of dead people does not destroy blindness. It simply changes the task. It is, in fact, what happens when a dowser is asked to find a piece of gold hidden in one of ten hiding places.

I don’t see what dividing them into sets of 10 would accomplish.

There’s nothing special about a 50/50 ratio. We could have somebody attempt, for example, to tell us which people were Scorpios. Is that somehow improper because the fraction of Scorpios is not 1/2?

Well, to know that it’s a higher bar than demanding statistical significance, you’d need to calculate a p-value. Since he’s made ten “dead” guesses, I’m fairly sure it is a higher bar than p < 0.05. But we’ll still want a p-value. If he does get 96% right (I’m not holding my breath), don’t you want to know how likely that would be by chance?

In any case, this high bar is not the end of the story. Suppose somebody claims that he can predict the outcome of a fair coin toss with 98% percent certainty. We test him and find that he doesn’t come close to this, but can reliably predict it with 70% certainty. I don’t mean that he gets seven out of ten, which could easily happen by chance. I mean that he gets, say, 7,000 out of 10,000. This is statistically significantly different from 50%. It is evidence that there is an effect. It is a weaker effect than what the testee claimed, but it is an effect nonetheless.

Musicat · October 5, 2009, 10:17pm

That was the argument PEAR made. They closed up shop a little while ago.

“Evidence that there is an effect”: by effect, if you mean that correlation proves causation, I don’t think so. I will agree with you that a statistically reliable number would be useful here, but hardly needed in this case. We’ve been offered a 96% success rate for a 50% chance expectation. P value or no P value, that’s a big difference.

It would be needed if the diviner claimed 51% accuracy if chance were 50%. Also, we are ignoring repeatability in this recreational experiment. Scientific, it’s not. Fun for some, it may be.

Princhester · October 5, 2009, 10:47pm

Cite? I’ve never read this, and there is enough misinformation about the incident spread by woos not to be making any more up.

glee · October 5, 2009, 10:51pm

But that’s not the task here.
We’re effectively asking someone to say whether or not there’s gold in each of 50 hiding places, then informing him there’s more gold than not. This biases the test.

See now you’re getting hung up on maths as opposed to what people claim.
Suppose someone makes 3,000 correct guesses out of 10,000. Is that ‘significant’? How about 3001? 3002? 7,001? 6,999?
Most results can be interpreted as ‘significant’, but this psychic claimed 96% accuracy, so that’s what you test for.

Musicat · October 5, 2009, 11:13pm

Calling me a woo? Mind your language, Sir!

I am recalling from memory, although I don’t see that what I said would in any way do harm to Randi or the MDC, nor diminish its value.

From one of the Randi blogs:

If you feel that I misspoke, I will gladly retract my statement “before Lintgen applied for the MDC,” if you feel that is misleading. I have no information that he ever applied for it, but I suspect he would have been encouraged to do so by Randi if his analysis suggested the paranormal was at work.

More:

snopes
Wikipedia

Uncertain · October 5, 2009, 11:20pm

You’ve mistakenly attributed my words to pramanujan.

I don’t know what PEAR is or what arguments they’ve made. If you think there’s a flaw in what I wrote, feel free to point it out. We could test any supposed predictor or predictive variable against “random guess” expectations without any prior claim about how well he/it will do. That is, in fact, the more usual case in statistical testing. When we ask, for example, whether men are more likely than women to smoke, we don’t “set the bar” at somebody’s assertion about how much more likely they are to smoke. We just test against the null hypothesis that the probabilities are equal.

I don’t know where you’re getting that. I’m no more assuming that correlation proves causation than you are when you take 96% accuracy as a threshold. I’m using “effect” in the usual statistical way, as in effect size, which certainly doesn’t mean “causation size”.

But it’s not a 50% chance.

We seem to have collectively gone in a circle. People were complaining that the people in the pictures were not 50% dead (or dead with a 50% probability), and that the “diviner” knew that. I said that that didn’t matter, so long as we did an appropriate statistical test. You’re saying that we don’t need a statistical test because 96% success with 50 trials is obviously statistically significant given a 50% chance of being correct. But there is not a 50% chance of being correct.

Plus, there’s my point above about the possibility of a result that is worse than 96% but still statistically significant. You apparently have dismissed that point (though you’ve offered only some kind of argument by association), but I stand by it.

If anybody wants to offer a cogent argument as to why Fisher’s exact test is inappropriate or meaningless for this problem, I’m all ears. Failing that, I’ll conduct such a test and post the results here (unless, as is not unlikely, the “diviner” happens to do worse than the average for random guessing, in which case the test is moot). Anybody who insists that it is meaningless or useless is free to ignore it.

Musicat · October 5, 2009, 11:41pm

PEAR was Princeton Engineering Anomolies Research Laboratory. They tried to influence electronic random number generators with wishful thinking. They thought that a 51% success rate, where 50% would be expected by chance, was highly significant, no doubt for mathematical reasons. They gave up after 30 years of trying.

Uncertain · October 6, 2009, 12:03am

Back in post #193, I said

and you seemed to scoff (I presume that roll-eyes was aimed at me). This is what we were discussing. As I said, if the number of dead people is one, it’s just like asking somebody to say which of 50 hiding places contains the gold. We might also hide gold in three of the hiding places and ask for three locations. Analyzed properly, it’s an informative, genuine test.

If one foolishly analyzes the results in either situation (announcing “more alive than dead” or “exactly eight dead”) under the false assumption of a 50% success rate by chance, the result will be bogus and will be biased toward the “diviner”. If one correctly analyzes the results, e.g., by the test I proposed, there will be no such bias.

I’ll take a stab at describing the bogus and correct tests.

The wrong thing to do–and you’re saying, correctly, that it would be wrong–is to compare the results to making each dead/alive guess by flipping a fair coin. Clearly our subject would almost always do better than this random guessing procedure.

But that’s not what anybody competent would do. Here’s one thing that we could do instead. The subject has, in this case, made 40 “alive” and 10 “dead” guesses. So we compare that to other ways of making 40 “alive” and 10 “dead” guesses. We imagine randomly picking which ten pictures get the “dead” guesses. We see how well the subject did compared to this random guessing procedure, which also has the advantage of making more “alive” than “dead” guesses. This is what Fisher’s exact test does.

I’m talking about statistical significance. You seem to be talking, at least in part, about what effect size would be “significant” in another sense. In any case, worrying about statistical significance is hardly “getting hung up on maths”.

If somebody can really guess fair coin tosses with 60% accuracy, we want to know about it (if he can guess red/black on the rouletted wheel with that accuracy, he’ll be rich). This is true even if he claimed 98% accuracy.

Oslo_Ostragoth · October 6, 2009, 12:10am

My methodology and results:

First of all, I would have preferred printed pix, but my printer is off, so I’m stuck with pointing at my monitor.

My rods are made of mild steel wire 18" long, bent at the 6" mark. My monitor faces north.

Crap - my wire appears to be mildly magnetized - off to find some copper wire. Hmm, I don’t like these as much - not very straight (made out of stripped house wiring).

OK for restart. First, pointing only at this reply form: no movement.

f01: slight attraction
f02: very slight attraction
f03: slight attraction
f04: very slight repulsion
f05: slight attraction
f06: very slight attraction
f07: very slight attraction
f08: slight attraction
f09: slight attraction
f10: slight attraction
f11: slight attraction
f12: slight repulsion (probably due to the porn star mustache)
f13: slight attraction
f14: slight attraction
f15: slight attraction
f16: slight attraction
f17: slight attraction
f18: slight attraction
f19: slight attraction
f20: slight attraction
f21: slight attraction
f22: slight attraction
f23: slight attraction
f24: very slight repulsion
f25: slight attraction
f26: slight attraction
f27: slight attraction

At this point, I tested the reply form again, and got a very slight attraction, so my grip is not very reliable. I don’t see much point in continuing unless I can come up with a method to stabilize my grip, and then test the pictures.

Musicat · October 6, 2009, 12:37am

Silly me, I thought we were trying to determine if each pix was of someone alive or dead. WTF does “slight attraction” mean? Alive or dead?

JimOfAllTrades · October 6, 2009, 1:14am

Ok, sorry this is late, I hate it when work gets in the way of stuff I want to do.

Ok first, the raw data. Analysis, such as I’m capable of, is below.

The first column is the picture number, second is the current status of the person, third is pramanujan’s guess based on his dowsing, and fourth is whether pramanujan’s answer was right or wrong. I’ve also put asterisks out to the side of the wrong answers to make it easier to count things up.

Pic – Status – Guess – C/W
f01 – Alive - Alive - correct
f02 - Alive - Alive - correct
f03 – Dead - Dead - correct
f04 – Alive - Alive - correct
f05 - Alive - Alive - correct
f06 - Alive - Alive - correct
f07 - Alive - Alive - correct
f08 – Dead - Dead - correct
f09 - Alive - Alive - correct
f10 - Alive - Alive - correct
f11 - Alive - Alive - correct
f12 – Dead - Dead - correct
f13 – Dead - Alive - wrong - *****
f14 - Alive - Alive - correct
f15 - Alive - Alive - correct
f16 - Alive - Alive - correct
f17 - Alive - Alive - correct
f18 - Alive - Alive - correct
f19 - Alive - Alive - correct
f20 - Alive - Dead - wrong - *****
f21 - Alive - Alive - correct
f22 - Alive - Alive - correct
f23 - Alive - Dead - wrong - *****
f24 - Alive - Alive - correct
f25 - Alive - Alive - correct
f26 - Alive - Alive - correct
f27 - Alive - Alive - correct
f28 - Alive - Alive - correct
f29 – Dead - Dead - correct
f30 – Dead - Dead - correct
f31 - Alive - Alive - correct
f32 - Alive - Alive - correct
f33 - Alive - Dead - wrong - *****
f34 - Alive - Alive - correct
f35 - Alive - Dead - wrong - *****
f36 - Alive - Alive - correct
f37 - Alive - Alive - correct
f38 - Alive - Alive - correct
f39 – Dead - Alive - wrong - *****
f40 - Alive - Alive - correct
f41 – Dead - Alive - wrong - ***** (same person as f08)
f42 - Alive - Alive - correct
f43 – Alive - Alive - correct (same person as f27)
f44 – Alive - Alive - correct
f45 - Alive - Alive - correct
f46 - Alive - Dead - wrong - *****
f47 - Alive - Alive - correct
f48 - Alive - Alive - correct
f49 - Alive - Alive - correct
f50 - Alive - Alive - correct

So the totals: 8 wrong out of 50, giving 84% correct overall. This is well short of the goal pramanujan set for himself.

Now since there were many more people in the list alive than dead, a guess of “Alive” is much more likely to be correct to than a guess of dead (my fault, among the people I know and have pictures of ready to hand, most of them are alive). I mean after all, if you just guessed everyone was alive, you get exactly the same score, 8 wrong out of 50.

So if we look at just how many of the dead people he got correct, the score is 5 correct of 8, 63% correct, still below the goal.

Additionally, two of the people in the pictures were obviously older, and I think pretty much anyone would have guessed those as dead. Without those, the percentage of correct among the dead is 3 correct out of 6, or 50% correct.

Of additional interest is the fact that I accidently included pictures of two people twice, pictures that were taken about 15 years apart. Pramanujan listed one of those people as dead based on one picture and alive based on the other.

On the plus side, he did get the first 12 pictures spot on, including 3 people who are now dead, one of whom was a baby in the picture. It seems to me the tendency to think of a baby as having died is somewhat less, so I thought that was interesting. However, he did this same person as alive based on the later picture.

But all of that except the base percentage is basically data mining. All we can say for sure is that he correctly identified 84% of the list. And given the deficits of the protocol (I told him there were more dead than alive, the dead/alive percentage was far from even, etc.) that just doesn’t look to me as being very far off chance.

However I’m not a statistician. Anybody want to take a crack based on the raw data above?

Also, I have a set of guesses from another poster that I think was generated by pure guessing. I’ll post the same analysis of that list a little later this evening.

Thanks again everyone, including pramanujan.

Oslo_Ostragoth · October 6, 2009, 1:17am

I don’t know. I was just trying to record my raw results, then see if there was any correlation with the facts. I also intended to test against some of my own pix, but since I would know the facts already…

Oslo_Ostragoth · October 6, 2009, 1:18am

Crud. I was going to ask that the full results not be posted until I got a chance to retest. I’ll ignore them as best I can.

Uncertain · October 6, 2009, 1:22am

OK, but what does that have to do with my point about somebody who claims to be a 98%-accurate coin-toss-predictor but is actually a 70%-accurate coin-toss-predictor?

If they really could do it with 51% success, that would be a big find. Of course, like you, I don’t believe for a minute that they could.

Perhaps this demonstrates that by making too much of marginal p-values, not accounting for multiple tests, etc., all aided by wishful thinking, one can see statistical significance where there is none. Or perhaps the lesson is that tiny differences, even if they are unquestionably statistically significant, might result from systematic errors and flawed experiments rather than the phenomena they are claimed to demonstrate.

But how does this apply to my point? Surely you don’t doubt that we can reliably distinguish a 70% coin-toss-predictor from a 50% coin-toss-predictor. And even if you do, this has nothing to do with whether he claimed to be a 98% predictor.

I can only come up with two other things you might be thinking: 1. You don’t think a 70%-accurate coin-toss-predictor would be interesting and important, or 2. You think a 70%-accurate coin-toss-predictor would be interesting and important if he claimed 70% accuracy, but not if he claimed 98% accuracy. I would find either of these positions baffling.

JimOfAllTrades · October 6, 2009, 1:24am

Oops! Sorry about that. I saw your results but didn’t see they were not complete. My appologies, I know you were trying to do a fair evaluation.

I guess the best I can say is… try not to look?

JimOfAllTrades · October 6, 2009, 1:44am

They were total unknowns. With maybe one exception who has appeared on the local news a few times and is know around town.

I just went through the pictures of people I know that I happen to have pictures of, cropped them all to close-ups of their heads and all exactly the same size.

Unfortunately (for science, fortunately for me) most of them are still alive.

Uncertain · October 6, 2009, 2:05am

Look, I predicted in advance how many of the photos would be of dead people! Proof of my psychic abilities. Or maybe it’s clairvoyance.

(Note: in case there’s any doubt, this was not a serious comment.)

JimOfAllTrades · October 6, 2009, 2:10am

I just finished catching up on the posts, and I did notice that. I thought it was funny you picked that exact example. I was going to post about your hidden talents

Oslo_Ostragoth · October 6, 2009, 2:20am

I haven’t looked yet, but I’m reeeeeeaaaaaally curious to see how the challenger fared.

The heck with it. I’m going to look. I’ll satisfy my curiosity another time.

BTW, I had heard of dowsing for graves before, but not dowsing photos to determine if the subject is dead or not.

Topic		Replies	Views
Does dowsing for water really work? Cecil's Columns/Staff Reports	160	14131	March 13, 2013
Finding water with coat hangers Cecil's Columns/Staff Reports	227	22004	November 8, 2008
problem with cecil's explanation of dowsing Cecil's Columns/Staff Reports	15	1191	August 13, 2000
Okay, the dowsing question Great Debates	130	5354	November 27, 2006
James Randi - sincere skeptic or money-grubbing fraud? (dowsing) Great Debates	68	68629	May 19, 2003

Water Witching

Related topics