You know, you can prove more than one thing. For example, here, you’re not pointing to any actual flaw in the test. You are pointing to the second analysis as evidence that there is a “flaw” at some unspecified point in the study–one might almost say, the re-analysis is your “proof” of a flaw in the study. I thought that was eminently clear in my previous post–but if you feel like you need clarification, there it is.
Back to the point. ** Since you haven’t pointed to any actual issue with the experimental procedure,** your “flaw” stands or falls on the quality of the re-analysis–that re-analysis, is in, fact the ONLY proof you have of the “flaw”.
As I pointed out in the substance of my post (which is also relevant to how to design a test based on the ability claimed), a second analysis of the overall results is not particularly effective at detecting an effect that is weak, intermittent, or not held by most “dowsers.”
That is because you’d expect individual dowsers to offset one another if only a few of them had any ability–and the number of tests per person is simply not enough to detect a weak effect with any level of confidence.
Also, on a side note–we’re not using complicated statistics here—maybe good high school or junior college level at best. The fact that you’re pointing to a famous person who did them is simply not relevant to their quality–since the level of skill required is not particularly great. (also, it would be nice if you actually cited them–but I digress).
My statistics is very rusty, but am I correct in assuming that:
If the test was done 50 times
The chance of randomly selecting water for a single try was 1/10
The random chance of hitting water 11 out of the 50 tries is actually about 8% or 1/13? Cause that doesn’t really seem to be out of the ordinary at all.
Actually, my simulation program shows there to be something in the order of 0.9 % chance of getting 11 or more hits in 50 tries. Not expected, but not very convincing either, especially since it wasn’t reproduced.
I’ve said several times already that the randomization method was flawed and could possibly introduce a bias into the results.
Also, with each testee only having 5 or 10 trials each, one lucky guess would make a 20% swing in the results for that person.
But there’s plenty of data missing from Randi’s account, such as the target, and the guess for each trial. That’s a flaw in itself. And such details as he does give are biased in his own favour. There is no way of telling from his account what went wrong.
You’ll notice that when they were tested again by other people their scores dropped. This indicates to me that their initial high scores were due to some mistake by Randi.
This isn’t a math question (though I could frame it in those terms)–it’s a common sense question. If you say you can identify red cards out of a deck of cards, does it matter how I shuffle them? One pseudorandom method may be closer to “true” randomness than another–but neither has any effect on whether you can tell, without looking, if the card is red or black.
The only way it could have an effect is if the pattern was so lacking in randomness as to be predictable. There is no suggestion that is the case–and even if it was, it would only be possible if the dowser was told what the “right” answer was in each test before beginning the next one. That wasn’t done here–and is in fact basic procedure–you don’t tell the subject whether he’s right or not until all tests are completeded.
So even if there was a predictable pattern, the dowser couldn’t learn it during the test–as he didn’t find out which pipe the water ran in until the end of the study.
Which is completely irrelevant, as a properly performed analysis would take the number of tests into account when determining if an extra hit was significant. Everybody understood this–which is why they didn’t give any weight to a single “extra” hit by any individual dowser.
This isn’t a flaw, it’s a design choice–it would be a problem if the test were trying to find low-level individual results–but it would be dumb to try to do that with this test anyway–it’s just not designed to reliably detect modest abilities held by only a few dowsers. there aren’t enough trials for each individual.
On the other hand, this test is fairly well designed to detect large-scale effects that most of the dowsers claim to have–where instead the fear is cost, and of one incapable doswer skewing the results of the others.
In other words, this point is like complaining that my Ferrari isn’t very useful at transporting a soccer team, or that my school bus keeps losing the F1 championship–the test is designed in light of what it’s looking for. If you are looking for something else, that is all well and good–but don’t criticize a Ferrari if you decide, after the fact, that what you really need is a school bus.
,
Water?
That’s not a flaw in the study, but one in reporting. I, myself, would assume that the results are recorded somewhere–and aren’t posted in a summary of results because the summary is all that is necessary to make the point in question.
And again, you only can say there’s a “flaw” in that you can’t figure out what went wrong from his account if, in fact, something went wrong. That we disagree on–and by your own account, you don’t have the skill at math to evaluate whether the statistics that “prove” the flaw were properly performed, or are relevant to the question asked.
To be clear–their results were higher than expected, but as I’m sure you saw in the material you cited, were not **significantly **higher than expected. That puts the results in a slightly different light, does it not?
Further, what you’d expect a good experimenter to do with such results is to re-test those individuals, to try and see if the results go away (in which case they are likely the results of chance), or persist (in which case they may be the result of an effect of smaller magnitude than the original test was designed to detect).
Or, as one might expect, this showed that their not statistically significant higher results were the results of chance. As you yourself point out–one lucky guess could provide a high, but not statistically significant result–and we shouldn’t put any weight on such results for that very reason.
Instead of a “flaw,” this is an example of Randi following good experimental procedure by following-up on even insignificant, but suggestive results, to determine if they really were simply chance results, or suggested a smaller-scale effect.
No, I don’t know how to calculate the values. I did a statistics module at university many years ago. I understand some of the concepts, but can’t remember how to perform the calculations.
But Randi’s team have no knowledge of statistics at all. There was one applicant for the challenge who was a PhD candidate in statistics. While discussing the design for her test, she kept talking about “confidence.” This got an angry reaction from Randi’s challenge negotiator, who thought she was talking about self- confidence, and accused her of making excuses.
This is why Randi’s challenges go wrong, they are negotiated by a man who doesn’t know what confidence means, and conducted by a man who thinks this is perfectly acceptable level of knowledge.
I at least know what confidence means. That makes me more knowledgeable than Randi and his assistants put together.
In the first place, Randi wouldn’t know what is significant.
In the second place, he’s been known to lie. A lot.
In the third place, why were they selected for re-testing if the results weren’t significant? I’d have thought they would only do so for significant ones.
In the fourth place, why doesn’t he tell us what the scores actually were? This inbformation does not appear to be available anyware.
By the way, can we go back to the questions raised in the OP to avoid possible moderator wrath, please.
The OP asks for tests that would convince a dowser that dowsing doesn’t work. I think that citing Randi has a negative effect as far as that goes. In my experience, citing Randi always makes the dowsers seem more plausible. If you show Randi’s test to 20 random people, 19 of them will not change their mind at all, and 1 will start believing in dowsing. Even if Randi is right he should not be cited.
Arthur C Clarke and Adam Hart-Davis have both seen Randi’s test and have become a bit more sympathetic to dowsers as a result. Clarke’s writings on the subject have probably influenced a few dozen more people.
What evidence is there that this test has ever had a positive effect?