Psychologists: Is there any scholarly evidence showing two random groups of people are "the same"?

I’ve recently become a little uneasy about the assumption that two groups of people can be considered “the same” in psychological experiments. The experimental methodology assumes that the individual-level heterogeneity on any number of levels washes out when you have a large sample size. Has there been any systematic work done to show that this is true? Or is it just something we’ve assumed to be true out of convenience?

I’m not a psychologist, but I think it’s the same in any experimental field. If you have two groups of data, it’s easy to quantify the variation within each group, and compare it to the difference between the two groups. One can then calculate how likely it is that the difference between two groups is “real” and not just a statistical fluctuation.

Right, but my point is that the variation within groups is only conducted on the variable being measured, really. Experiments that aren’t specifically addressing an issue about it won’t look at socioeconomic status or number of siblings, or any other kind of individual level difference that might be radically different from person to person. You can imagine that certain characteristics that aren’t examined might be radically different between the two groups just by coincidence.

Other sources of variation should be irrelevant if you’ve got a large, randomly-selected sample. There shouldn’t be big differences between the control and experimental group. Both were recruited from the same population, and should have similar (average) socioeconomic backgrounds along with every other non-experimental variable.

Granted, that’s an ideal that we strive for, and only achieve sometimes.

Oh, and I’m not a psychologist but I’ve done my part as a guinea pig in a number of psych studies. There’s often a standard sort of questionnaire that all participants fill out, which should help researchers find out if there are big differences between groups.

First, for studies where such factors may be thought to be relevant, groups are often matched for age, sex, etc. Second, statistical methods take the possibility that differences (or similarities) may be due to chance into account. To be considered statistically significant, differences must have a less than five percent probability of being due to chance.

It just seems to me that the role of culture or upbringing is underacknowledged in experimentation, in favor of observable characteristics. These aspects could have profound effects on the way people behave or the way they respond to things. Any thoughts on this?

You might be right, but culture isn’t completely ignored. Many psych studies use psych 101 students as test subjects (out of convenience). Which means, of course, that results from a study of psych 101 students do not necessarily extend to the entire population. But interesting or controversial results from those limited studies are often repeated with different sample groups to see if the conclusions can apply to the “general population”.

Every so often, psychologists will test extremely disparate groups to look for universal traits. For example, I recently heard of a study on early childhood learning where the researchers tested both a group of German children as well as a group of Australian Aboriginal children. They found that some particular learning ability was present in both groups. (apologies for the vague recollection…)

As long as samples are selected randomly (or pseudo-randomly in a way that reasonably approximates true randomness), the laws of chance will apply. There is a certain possibility that samples will differ from one another by chance, but that possibility can be calculated (or reliably estimated), and taken into account. This isn’t specific to psychology; it applies to any random process, from molecular motion on up.

The above all deals with random variation. A different problem is systematic variation. For example, suppose you decide to look for differences between two groups defined by some characteristic (say, smokers and nonsmokers). You may find a difference between them, but it may not be for the reason you thought – the two groups may differ systematically on some other factor you hadn’t considered, and that may be what is really causing the difference.

If the samples (test groups) are chosen without bias, those factors merely introduce random noise to the data. The result is still valid, although it may take a larger sample to get a meaningful result.

For example, if you take 100 volunteers as test subjects and put all Cauasians in one group and all minorities in the other, of course that would introduce a systematic bias which would skew the results. (i.e. the two groups would behave differently.) But if you use a random method to group the volunteers into two groups, there is no systematic bias.

Okay, I’m still having some difficulty processing the responses here because I feel like some taken-for-granted assumptions are being made in a lot of the theoretical statistical justifications being stated.

Here is my thinking: Say you have 2 groups of 100 people. I see each of these people as a composite of some 10,000-50,000 salient factors like personal characteristics, life experiences, preferences, brain patterns, etc. (many of these factors seem endogenous or inextricably related, by the way). You might think that these factors are not important, but as I see it, these are what distinguish one person from another, and ultimately define a person’s behavior as the output of a self-referential system that is constantly updating its own psychological processing through life experiences. Given the number of factors in each person, it seems near impossible to show that these two groups are really the same. Is there a flaw in my reasoning?

They don’t have to be “the same.” The groups merely have to have been chosen randomly with respect to those characteristics.

Really? I thought the randomness was a tool used to generate (or approximate) sameness.

It’s not so much that it generates sameness as that it makes the differences even out, given large enough sample sizes.

Okay, right. But should we assume that differences evening out will allow commensurability? Is there a proof for this?

Statistical tests take into account that there will be a certain amount of error in detecting differences in populations. These false positives and false negatives are known as Type I and Type II errors respectively. In doing statistical tests, you determine what level of error is acceptable (generally a 1/20 or 1/100 chance that the experimental results are due to chance.)

Your argument is really saying that an experiment can’t be done on a single subject (or on very few subjects) and have it say anything meaningful about the population as a whole. Anyone doing properly conducted experiments (psychological or otherwise) knows that.

However, if you get a large enough random sample then the variations in the characteristics you are not interested in should balance out. You’ll get people with high IQs and people with low IQs, people that are overweight and people that aren’t, people that love soccer and people that can’t comprehend why everyone doesn’t consider it deathly boring (thinking of a certain Pit thread :D), etc., etc. All of these will even out if you have a sufficiently random sample.

Statistical tests can be done to determine the likelihood of what you are seeing just being due to chance (i.e. due to the uncontrolled variation in your sample rather than due to what you are studying). If the statistical tests show that there is less than a 1 in 20 chance (usually; that rule is not cast in stone) that the results occurred by chance, then the hypothesis in question is accepted.

ETA - Just to be clear; if all these characteristics that worry you are THAT important, then they will increase the variability of the results, which will make it harder to show statistically that there is a 1 in 20 chance or better of the results not being due to chance.

Yeah, I know about the Type I and Type II errors… but I guess I was actually asking (and I realize that I might not be articulating this as well as I might) what principle gives us the license to compare those groups at all? That is, is it just an assumption that comparing the groups is fair game? It seems to me like there is an underlying assumption before you even get to the Type I and Type II stuff.

It’s not an assumption that the two groups were selected without bias. If you use a truly random method (coin toss or a good random number generator), you can separate the test subjects into two groups without bias. It’s still possible that group B ends up with more Hispanic test subjects than group A, for example, and it would skew the result. But as the sample size is increased (or the test is repeated), the likelyhood of this goes down. It still won’t be zero, but it’s at least well understood (i.e. you can calculate how likely it is that the experiment outcome was caused by a skewed sample). Also,

Ok, I think I’m not stating my thought well enough. Let me try it this way:

Let’s say objects and animals could talk and fill out surveys. We form 2 groups of 100 completely random animals and objects selected from all over the earth. We can imagine the distinct possibility that every single one of the 200 objects total is unique, especially since there are billions of distinct objects on this planet. Yet, according to the principles of statistics described above, it is fair to compare these two groups because they were randomly sampled. This intuitively feels like a bit of an assumption to me. If everything is unique, on what basis can we say they are commensurable? I understand all the stuff about distributions of population characteristics, variance, etc. I guess the part I’m not getting is what allows us to assume that populations of completely different things can be compared at all just because they are randomly sampled?

Well, if you are trying to test for common characteristics of all animals on earth, that would be a perfectly valid way to do it. You could, say, expose group A to a 130-degree temperature, and leave group B in a 70-degree room, and ask them both if they are comfortable and relaxed. I’m guessing you will find a statistically significant result that group B are more comfortable.

If on the other hand, you play Mozart to group A and the Beatles to group B, you’ll probably find that there is no statistical difference between how comfortable and relaxed the two groups of animals are. That means the variation between individuals is larger than the response to the music.