How well documented is the "experimenter effect"?

In case you are unfamiliar with the term, or need some memory jogging, the experimenter effect is a phenomenon where the outcome of an experiment relies on the beliefs of the person running it.

Now I am referring to an article, the details:

New Scientist, 13 March 2004. Special Report: The Power of The Paranormal. The power of belief; John McCrone.
Now to cut a long story short on his article, it discusses the re-emergence of the “experimenter effect” into the psychic “powers” debate.

Basically, a pair of scientists (Richard Wiseman - psychologist at University of Hertfordshire, UK; Marilyn Schlitz - parapsychologist at Institute of Noetic Sciences, Petaluma, California) are testing this “experimenter effect”. It is thought that by running two seperate experiments (one done by the skeptic - Wiseman; the other carried out by the believer - Schlitz) that some sort of statistical analysis will reveal whether or not there really is an experimenter effect.

The idea is that when the believer runs the experiment, she will obtain data that is statistically in favour of psychic phenomena. Whereas when the skeptic tries the experiment, he will obtain data that statistically does NOT support psychic phenomena.

It goes without saying that both of the scientists will conduct FAIR experiments, so we should assume that they have maintained thier integrity (and objectiveness).

So my question(s) is(are):

  1. How well documented is this “experimenter effect”?

  2. What could possibly (in the known laws of science) be causing an experimenters belief alone to change the RESULT(s) of the experiment?

  3. In any experiments, assuming that perfect physical conditions exist (which I know is a practical impossibility) so that outside interference is minimal or zero, can an experimenters belief alone change the nature of the experiment (and thus the data we obtain from it)?

  4. If we assume that the answer to number 3) is YES, then is BELIEF alone a sufficient tool/mechanism that will change an experiment? Could it not be something even deeper, like our own perceptions of the experiment itself (i.e. the observer causing a direct change upon the experiment though no DIRECT PHYSICAL CONTACT is made)?

  5. And if so, then what implications does this have as a whole for scientific experiments?

I don’t know about formal documentation, but “experimentor effect” is real.

The experimenter’s beliefs effect his judgement. The believer is inclined to accept a supernatural explanation, while the skeptic is inclined to dismiss it. This effects how they will interpret their results. Even if the data is the same for both guys, they will probably interpret it diffferently.

But that is regarding the interpretation of the data. The article referred to suggests that the data obtained is different when a believer runs an experiment as opposed to when a skeptic runs the experiment.

So in your version, you are saying that both experimenters obtain identical data, however they intepret that data differently.

However, this article is saying that even when the experiment is run in identical conditions, a believer will obtain different numerical data than a skeptic.

Now that, to me, is just weird.

Ah, my bad.

Perhaps, the experimenter’s beliefs could cause them to structure the experiment differently, ie the skeptic might set a higher standard for “success” than the believer would.

If the experments are set up identically, however, then I can’t see how the experimenter’s beliefs could effect the data obtained.

(I haven’t read the article cited, but have read quite a few on the subject, from both sides.)

Differences in judgement certainly can have a big effect, but that can sometimes be bypassed by setting up a blind assessor of the results. (That depends a lot on the design of the experiment and how sobjective the direct measurements are.) If the assessor doesn’t know which set of results he’s looking at, the preconceptions of the experimenters don’t matter.

Of course, the psychic’s usual explanation is that he’s psychic enough to be inhibited by the presence of a skeptic.

I would wager that many (though absolutely not all) experiments require some sort of interpretation/judgment during the data collection phase. That is, they require a human to look at instruments, etc., and make a call about what number to write down (how many seconds did that subject take to react–3 or 4?; was the temperature closer to 37 degrees or 38 degrees?; did the bacterial colony cover 60% or 65% of that petri dish?). Most of this, it seems to me, could fall under measurement error, but certainly cumulative effects might be slightly noticeable.

How pronounced is the experimenter effect generally supposed to be?

What the experimenter expects to find can have a pretty big impact on what they will find. You can’t, for instance, see unexpected microwaves if your experiment only is designed to detect gamma waves. This kind of impact on results can be as big or as subtle as you like; it’s all up to the experimenter, really.

I cannot tell from the description of the Wiseman/Schlitz experiment if they are using the same experimental setup. If not, the experimenter effect may be built in, even if they try very hard to duplicate the setup of the other exactly; they may be quite likely to find the experimenter effect if using separate setups, and their results would prove nothing about the nature of the effect. Even if they use the same setup, it’s entirely possible something subtle about how each individual manipulated some apparatus or instrument could still add an experimenter effect; even then the result may prove nothing.

I’d say, if they really want to get to the bottom of this question, they should do this:

Design an experiment such that it will result in one of two outcomes. Only one can be explained by the psi experimenter effect. Let’s call the “no-effect” result A, and the “psi-effect” result “B”.

Approach a number of people, and ask them what they think of the hypothesis that B will occur if they carry out the experiment. Invite an equal number (the more the better) to carry out the experiment who say “yes, it’s plausible” or “no, it’s not plausible”. Don’t tell them the real reason they’re doing the experiment; all that matters is that they think the hypothesis is sound or not.

Compare the results. If there’s a statistically significant difference in outcomes favoring the psi-effect hypothesis (which would be very easy to calculate with a simple T-test), have somebody else in another institution perform the exact same experiment. If again there is a significant result favoring the psi-hypothesis, start thinking of new experiments to test the effect more directly, because maybe then you’re on to something.

As I see it, the experiment as concocted by Wiseman and Schultz cannot attain statistical significance, even if they repeat it several times; or, at best, it could disprove the hypothesis. Since they know why they’re doing the experiment, I’m not sure they could definitively rule out mundane explanations for a psi-like effect, even on the same apparatus. They could only do so with large numbers of unwitting subjects, to any reasonable level of satisfaction. In short, their design seems to invalidate all but a null result, in which the believer cannot produce the psi effect. Even then, I doubt the result’s significance. A statistician could probably address that matter better than myself.

I’ve always felt the Stanford Prison Experiment was a joke because of abuse of the experimenter effect. The professor who ran the “experiment” entered it with a set of pre-conceived beliefs, altered the starting conditions to fit his beliefs, continuosly interfered with the events that were occurring, and interpreted the results according to his opinions of what should have been the result. What’s surprising is that many people still accept the conclusions of this experiment as valid.

Until now, I’ve never heard anyone challenge the Stanford Prison Experiment and I’m fascinated. Can you provide some extra reading for me? (That is, pointers to documentation that the experiment was biased.)

It’s important to distinguish between the experimenter effect, which is well established, and the “experimenter psi effect”, which is silly. Both are intelligently explained at SkepDic.com.

From that site:

"In 1976, Kennedy and Taddonio introduced the expression “experimenter psi effect” to refer to “unintentional psi which affects an experimental outcome in ways that are directly related to the experimenter’s needs, wishes, expectancies, moods, etc.” (Smith: 79).

Alcock (2003: 35) notes that the appeal to an experimenter psi effect to explain irregularities in attempts at replicating psi experiments is simply begging the question."

Yeah. I actually spent about 45 minutes designing a nice experiment to test psi effects using simple coin tosses. I had all the stats worked out, etc.; then I read that article and realized there was no way to argue successfully that a null result could not be explained away by other psi phenomena, given the objections posed to the design of previous experiments. It appears the mere presence of skeptical minds in the universe could be invoked to explain away results that do not agree with the existence of psi phenomena. In other words, it’s completely pointless.

There’s an article in Wikipedia about the experiment. It gives two reports about the experiment including one written by Professor Zimbardo (here and here). It also has a critique of the experiment I hadn’t seen until now (here).

I originally wrote the following section in the article. It’s been extensively revised in the current article, so here’s my original comments:

*It can be argued that the conclusions that Professor Zimbardo and others have drawn from the Stanford Prison Experiment are not valid. Professor Zimbardo acknowleges that he was not merely an observer in the experiment but an active participant and in some cases it is clear he was influencing the direction the experiment went.

For example, Professor Zimbardo cites the fact that all of the “guard” wore sunglasses as an example of the dehumanization. However, the sunglasses were not spontaneously chosen as apparel by the students; they were given to them by Professor Zimbardo. The student “guards” were also issued batons by Professor Zimbardo on their first day, which may have predisposed them to consider physical force as an acceptable means of running the “prison”.

Professor Zimbardo also acknowleges initiating several procedures that do not occur in actual prisons, such as blindfolding incoming “prisoners”, making them wear women’s clothing, not allowing them to wear underwear, not allowing them to look out windows, and not allowing them to use their names. Professor Zimbardo justifies this by stating that prison is a confusing and dehumanizing experience and it was necessary to enact these procedures to put the “prisoners” in the proper frame of mind. However, it opens the question of whether Professor Zimbardo’s simulation is an accurate reflection of the reality of incarceration or a reflection of Professor Zimbardo’s preconceived opinions of what actual incarceration is like.*

Now I will concede that Professor Zimbardo’s experiment did raise some interesting issues. But not the ones he thinks he did. He did find some disturbing evidence of how easily members of a group can be persuaded to act in manners they would find immoral as individuals and how quickly people can be forced into roles by their environment. But Professor Zimbardo has steadfastly maintained all along that the situations he found are specifically true of a prison environment which, as I have stated, I think is unsupported by the facts he himself has presented.

Professor Zimbardo created a environment that had little resemblance to an actual prison environment. He then assigned random people to roles who had little resemblance to the people who filled these roles in the real world. He could just as easily assigned his students the roles of “management” and “labor”, “teachers” and “students”, “black people” and “white people” as he did the roles of “guards” and “prisoners”. In which case he could have used the same experiment to produce the same results and said he had made discoveries about economics, academics, or race relations. Professor Zimbardo failed to realize his conclusions were only as valid as his flawed model.

There have been some good replies and cites so far. I can add a few extra grace notes.

In the 1970s, when ‘paranormal’ research took off in a big way, some proponents of the psychic powers hypothesis were curious about the way that successful demonstrations of psychic ability under scientifically controlled conditions seemed difficult to obtain. One suggestion was that if the person conducting the experiment was either overtly skeptical, or (in some variants) neutral and therefore not supportive, then this could inhibit or block the operation of psychic forces. This idea was mooted in all seriousness by some commentators on the psychic research scene, and it’s an idea which gained some currency among believers. Terms like ‘the experimenter effect’ were coined to refer to this thesis. To researchers and people with other points of view, this argument sounded like a lame excuse for the fact that psychic powers were moonshine.

As others (in this thread) have stated, there is no doubt that scientists and researchers can introduce bias into their findings, based on their own personal beliefs. Some have done this quite knowlingly, and there are many books on the history of fraudulent science (some, but not all, of it pertaining to the paranormal). Many more have done this unwittingly. They are essentially good scientists, but nonetheless their personal beliefs lead them to design experiments in certain ways, or interpret the data in certain ways, which leads to a biased result. In other words, scientists are only human, and though they may be trained to try and design and conduct experiments in a wholly impartial way, they sometimes fail.

The term ’ experimenter effect’ is sometimes also used in a euphemistic way to imply that a given scientist is simple out of his depth, or conducting research into ‘psi’ powers when he or she isn’t remotely well-qualified to do so.

So far, so good. The Wiseman / Schlitz collaboration is not about the effects mentioned in my two previous paragraphs i.e. not about fraud, bias arising from flawed human nature, or incompetence - all of which are understandable. It’s about the kind of experimenter effect the pro-psi people were going on about in the 70s.

Wiseman and Schlitz can speak for themselves, but I know Richard Wiseman and I’ve been to his research place at the University of Hertfordshire. Basically, this collaboration between him and Schlitz extends back several years. By my informal reckoning, at least 5 or 6 years and maybe longer. They have both tried conducting an IDENTICAL experiment into a particular ‘paranormal’ claim. Forgive me, I can’t remember the precise details of the claim or the experiment, but I think it had to do with simple ESP or telepathy. Anyway, the point is that the two of them, acting in good faith and striving hard to create a bona fide impartial experiment that could be replicated, came up with different results. Schlitz got results that provided a degree of support for the psi hypothesis. Wiseman got none (results were strictly in accordance with chance). Intrigued, they tried to refine their experimental protocols to make sure they themselves, and whatever beliefs they happen to hold, could not influence the outcome in any way. Same result: Schlitz got mildly positive results, Wiseman got none. I believe from my conversation with Richard Wiseman that they have gone round this ‘loop’ at least twice, and it may be more. This can be a slow process, given that they are on different sides of the Atlantic.

Within our current understanding of the way the world works, there is no good working hypothesis for what could cause this, or how a good, competent and impartial experimenter could affect the outcome of the experiment. The experimental method, as used within conventional empirical science, is (a) suppposed to render human bias irrelevant and (b) assumed to do so. It is the latter point that Wiseman and Schlitz are now investigating in a formal and controlled way.

It seems like this so-called “experimenter psi effect” could be tested easily. To be a valid test, it would have to be done double-blind. Have whatever you’re testing be setup so that the experimenter doesn’t know the correct result.

If it’s guessing shapes on cards, you can do the whole procedure with skeptics present, and with them gone, and see if the results are equivalent. But it’s key that the data be double-blind.

The biased researcher simply discards undesirable trials as “experimental error” and retries until he gets trials that square with his expectations.

Problem solved. Next!