From the original link:
Is there any truth to this statement?
From the original link:
Is there any truth to this statement?
Right, and I agree completely with the rest of your post; when I posed the question you were responding to, I had something more like your above example in mind. And that was my point: specifically, that there is a point at which the results would make *almost all of us *much more interested, right? I’m simply wondering that, given the data claimed by the PEAR folks, how far short are their results of that point?
So take three people–Tom, Dick, and Jane–tossing a thousand coins per trial, the object being to maximize the total (not necessarily consecutive) number of heads. After one trial:
Tom: 475 heads
Dick: 555 heads
Jane: 505 heads
(This is the point where someone calls the newspapers to inform them of Dick’s Unearthly Powers of the Mind.) Meanwhile, the experiment continues. After two trials:
Tom: 535 / Dick: 488 / Jane: 515
After three: 513 / 490 / 510
…and so on, right? We record the results of each trial, taking the a new average each time that divides, for each person, the sum of all his/her results by the total number of trials. If I understand correctly (looks around nervously), as the number of trials increases, each person’s average should converge on ~ 500. Which is not to say they’ll all have the same average after 10000 trials, but that they will all be within some expected range +/- x of 500 after n number of trials.
If Jane’s numbers stayed strangely high, would it be considered interesting enough to repeat the experiment with Jane and two different people? How high would they have to be? How many trials?
Is there any way you could make such an experiment statistically sound–such that we start with a hypothesis which will either be supported or not supported by the experiment’s results–and reach a conclusion within a reasonable number of trials? Or will there always be more than one way to view the data from such an experiment; will there always be a context in which a subject’s results, no matter how “impressive” given the experiment’s parameters, could just be chance?
Agreed, if that’s what they are in fact doing. As I’ve said, I’m not qualified to judge either way; I’m just not willing to infer that type of judgment based on the credentials of the scientific journals willing to publish their work.
And I appreciate the posts of everyone who are more qualified to make such a judgment, such as Humanist and others. Math is anything but sterile–they just make it seem that way to torture school kids.
Well, there’s a lot riding on how you define “feeling.” I mean if the surgeon puts his hand on your head and says, “Oh yes, the spirits are telling me you have a brain cloud,” you definitely wouldn’t trust his feeling.
But let’s say he’d run some tests which came back inconclusive, and the HMO is pressuring him into not doing some kind of diagnostic procedure because conventional medican opinion is that the procedure isn’t warranted without clear data indicating its need. But the surgeon has a feeling that the big festering purple boil on your head and the searing headaches you’re getting are not, in fact, a coincidence, a feeling that the procedure is necessary to reveal the true nature of the condition. Do you trust his feeling? I do. I’d trust his feeling with a lot less evidence, even. And be glad he was listening to them. For the simple reason that education and experience shape the way one feels.
Just like I’m inclined to trust the feeling of someone who has studied a lot of math and teaches it for a living, who has examined this project in some detail (but clearly hasn’t had enough time to examine it completely–there’s too much material), when he expresses his opinion that there are basic flaws in the methodology. It’s an educated opinion, but you can’t tell me it’s all data. It’s at least part feeling, too, wouldn’t you say?
It would be hard for me to believe that the long history of science hasn’t seen one successful innovation arise as the result of–at least in part–a scientist’s feelings. Good scientists have instincts; I’ve seen it…they cultivate them. I suspect when said instincts are for accuracy, objectivity, replicability, or skepticism (which is one of the first tools in anybody’s box), then they seem less wishy-washy, less like those spoon-bending-type feelings. But I would submit they are feelings, however much they owe to the learning and training that precedes them.
Yes, that’s right. In the coin-flipping example you’d expect a certain amount of variation but as you do more and more trials you would expect the mean to get closer and closer to 50% heads. It’s data sampling and IIRC the error in the sample mean varies inversely with the square root of the number of trials (that is, you’d expect your calculated mean to have about half the error if you run four times as many trials).
At some point, yes. However you do have to decide what probability event is going to trigger your “investigate this” alarm and you have to calculate this based on your entire sample space. That’s where large numbers of trials are both a friend and an enemy - if you run a lot of trials then many sequences that are statistically unlikely to turn up on any given trial will be much more likely to appear over all of your trials taken together (ten heads is unusual if you flip one coin ten times. It’s not unusual if a thousand people flip a coin ten times each).
So perhaps you set a threshhold that if Jane achieves a result that has a random chance of 0.000001 and if she can hit that result repeatedly then you claim there’s something unusual going on.
I don’t see why not. The exact procedure would depend very much on what you are testing for though, and you’d want to make sure that your method and the analysis you want to do of your data afterwards are good BEFORE you start otherwise you run the risk of naming the target after shooting the arrow.
On the second part of your question, sure, any outcome COULD be due to random chance, all you can do is say that the observed outcome was extremely unlikely and so supports your hypothesis that something other than random chance is at play. There’s a chance (0.5^1000) that if I flip 1000 coins I’ll get 1000 heads…it’s not zero but it’s pretty damn small and I’d certainly pay more attention to somebody talking about their psychic coin-flipping powers if they could do it under observation.
Their results are, at best, a clearly defined methodology short of significance. I’ll try to show you why this is with my explanation to your example.
I may be misunderstanding you here, but your experiment set out to determine total number of heads overall, not per person, correct? If this is the case, then all we can say about the experiment is either “they got more heads than expected”, or “the amount of heads they got was within stastical expectations.” Now, if, after determining success or failure on the total heads, we see that someone’s data set looks odd, then yes, this may be worth a second experiment.
We don’t, however, want to redefine our parameters in an ongoing experiment, because once we start this, there’s no limit to how many ways we can redifine yet again to match our data. Suppose that, for instance, somewhere in our experiment, Jane starts getting abnormally low numbers of heads. Do we then redefine our hypothesis again to say that her standard deviation is higher? Suppose we find that she is always low when Tom is high. Suppose that she gets interesting-looking stastically unlikely patterns of heads and tails. Once we have redefined our methodology and our success quota once, how many times are we allowed to redefine it before it becomes clear that we’re defining success as “whatever answer we get”? That’s why when we’re out to prove a new hypothesis, we gather a new data set.
Yes, it is possible to define such an experiment properly and conduct it with accuracy. Let’s say that it seems that Jane is getting 3% above expected numbers. But, well, just to make it easy on her mental powers of coin control, we’ll say that if she scores 2% above average, we’ll accept her results as genuinely strange. Let’s also say that we will accept these results as genuine if there is less than a 1/1000 chance (0.1%) of her achieving her score by chance. In reality, for an experiment with such narrow margins of success, we’d probably want to make more certain of this phenomenon, but I’m trying to keep the math somewhat simple here. How many trials would we need to do? Unfortunately, I’m busy helping students with math right now, so I can’t devote the time necessary for this calculation. I’ll try to post an answer tomorrow.
If you were searching for “A” and the experiment did show some interesting results- which however, were not in any way related to A- then you start a new experiment for B.
Not exactly, I was going for a per-person average in the above example. But really, we can make it even simpler than that. Instead of three people, we’ll have just one person, who I’ll call the ‘operator.’ Instead of having the operator flip coins, we’ll have the system flip coins, and instead of coins, we’ll use a random number generator outputting binary bits. So instead of a coin toss, the RNG ‘toss’ will either result in a one or a zero.
Let’s define n as the number of tosses made by the RNG for one trial.
Let’s define k as the total number of trials conducted, since we’ll obviously want more than one (so after k trials, the RNG will have made k*n tosses total).
Let’s define z as the number of tosses per trial resulting in zero.
Let’s define y = (z/n)*100, or the percentage of tosses which resulted in zero.
Let’s define x as the average of y over all the trials k. So if k = 3:
x = (y1 + y2 + y3) / 3
x = (sum of y for all k) / k
Make sense so far?
The operator’s instructions before each trial are simple: attempt to ‘mentally’ bias the output of the RNG for each trial such that z is maximimzed.
Let’s say that n = 1000; in other words, one trial = one thousand tosses. Which means the value of z after each trial would be: 0 <= z <= 1000
Now we need a hypothesis, and I think it would be helpful to frame it in a way that should be easy to support. In other words, instead of basing the hypothesis on the supposed powers of the operator, let’s make the hypothesis an affirmative statement of what we would expect the results to be based on statistics. I imagine it would be something like this:
Given a fixed n, as k increases, the value of x will converge on ~ 50%
We would expect a wide range of values for z, and thus a certain range of values for y, but as k gets larger and larger, we’d expect x to get closer and closer to 50%, right?
Assuming I’ve stated the above clearly/accurately, how much can x deviate from 50% before we can conclude that the experiment’s results are not supporting our hypothesis? I imagine this (the expected range of x’s deviation) would be some function of n and k.
Let’s define that variance as v where v = x - 50%
(Obviously x could turn up less than 50%, but let’s say for the sake of argument that x is always >= 50%).
In other words, let’s say n = 1000 and x = 51%, so v = 1%. The signficance of that 1% deviation is largely based on the value of k. In other words, that result of 1% is a lot more interesting if k = 10000 as opposed to if k = 100.
If the above works, then we should be able to turn it all around and define a context where x - 50% would be significant. In other words, if v = 1%, how large do we need n and/or k to be for v to be significant? Will that small of a variation (1%) ever be significant? And if not, what is the minimum value v would need to be in order to be considered significant?
And you’re completely right; you shouldn’t infer anything just from where it’s published. I’m sorry if I gave you the impression I was not impressed with the journals they publish in, so their work must be faulty. It’s the other way around; I think their work is faulty, and that’s why I’m not impressed with some of the journals they publish in. The Journal of Psychical Research can print great science just as easily as Nature, and Nature can print bad science too. Just because the journal it comes in isn’t prestigious doesn’t mean it’s wrong.
Let me describe what I think are some of the errors they’ve made:
Conscious, rather than random or even haphazard, selection of start points for data to be analyzed.
Performing chi-square tests as Nelson et al. have done requires the fundamental assumption that the data used is a random sample of some larger population. The data is a value representing the difference in output between random event generators. Note use of the word “random” here…because the event generators are “random” does not mean that the sampling is. In fact, Roger Nelson fully admits that the sampling is not random:
From * Correlation of Global Event with REG Data: An Internet-based, Nonlocal Anomalies Experiment*. 2001. The Journal of Parapsychology, v. 65, pp. 247-271.
The population of data points consists of all the numbers from an EGG. The choice about which numbers go into the chi-squared test is not random. I think that means the test is unjustified, and regardless of the probability calculated by the test, it doesn’t really tell us anything because the test was inappropriately used in the first place. Not only can they non-randomly select times to analyze data, but they can also non-randomly select the eggs from which they analyze data, for example in their Princess Diana funeral incident.
Expectation of trend lines in small portions of their data set to have no slope.
The EGGs are great at generating data, they just keep going continuously, giving an extremely large amount of data. Nelson et al. are correct in expecting that the trend should approach zero over time if both types of generators are truly random. They have no convincing explanation for why the trend should approach zero within the specified segments of time they have chosen, anywhere in length from 1 minute to several hours. Let me give you an analogy. If you take a piece of graph paper and plot out these points: (10, 17); (30, 50); (47, 73); (125, 82). Then see if you can percieve a trend. You should see a rising trend, but the function that produced this is a simple sine wave with an amplitude of 100, and sine waves have no linear trend and a best fit line should have a slope of zero. The end effect of choosing particular place to start and to cut off has made it look very much like there is a linear trend.
Incomplete use of controls.
From the same paper as above. I agree that they should not differ from chance, but the essential question that he doesn’t answer is: do they differ from chance?
For that matter, why is calibration data (used to ensure that your machine is working the way it’s supposed to) appropriate for use as control data?
Conducting a metanalysis without accounting for the inevitable positive results.
If you conduct a statistical test with a level of significance of 5%, then if your results are due to random chance, there is only a 5% chance that you will get a significant result. That’s not bad, a 95% chance your significant results are due to something other than chance.
Then let’s imagine you go and do that test on 200 different experiments. You find that 13 experiments have significant results. Was that due to random chance or not? It still could have been, because you would expect 10 significant results from random chance (type I errors), but that doesn’t mean you will get exactly 10.
Hopefully this clarifies a bit why I don’t think their statistics carry a lot of weight.
Relax. This is an online discussion forum. It is a little inaccurate for you to tell me what I can and cannot post here. The owners of the board have that perogative.
I am also under no obligation to do a reading assignment and presentation for you. If you want to discuss the issues, then by all means please do.
If you want to discuss my overwhelming meaninglessness and ego-validation, I’m sure there’s another forum for that.
One way we do wholeheartedly agree is that it is not impossible for these “random event generators” to actually be non-random. I think it would be interesting if they show that; and hypotheses after that can be up for grabs later.
This thread needs a little more levity.
If these EGG random number generators recorded unusual numbers before the 9/11 attacks and the Indian Ocean tsunami, wouldn’t that more believably be interpreted that the EGGs caused those horrific events? Aplogies to Bob Park for stealing his idea.