Basically that. Why are there separate tests when the groups are matched pair or separate groups?
There is one (standard) t distribution with a parameter called degrees of freedom. All t tests construct a variable based on the data which (under the null hypothesis) has (at least approximately) the t distribution for the relevant number of degrees of freedom.
There are different ways to construct this variable depending on exactly how the data is related. If you have independent samples you use the unpaired t test. For example if you’re trying to figure out if the average height of 20 year olds is different from 40 year olds, you might measure 30 of each type and compare the average heights of the two groups. If, however, each of the 20 year olds was the son of one of the 40 year olds, it would be better to look at the paired father son differences in heights.
Let x and y represent the height of a person in each group. The null hypothesis is the groups are the same so E = E[y], and var = var[y]. Then we’re interested in E[x-y] and whether of not this is positive. It should have under the null hypothesis E[x-y] = 0 and var[x-y] = var + var[y] -2cov[x,y]. For the unpaired test cov[x,y] = 0. But for the paired father-son test presumably cov[x,y] > 0 so the variance in that test is smaller and it will be easier to reject the null hypothesis. In any case when you do a paired test, you could do an unpaired test, but it is more powerful to do the paired test.
Note your samples don’t have to be the same size for the unpaired test, then the variance is a bit different. Of course, for the paired test the samples must be the same size. But equality of sample size is only necessary for using a paired t test. The important thing is you use the paired test when you think there is some relation between individuals in the two groups.
This is off the top of my head without references, but I’ll give it a shot. One may be able to think of it as unpaired tests the differences between two means (of groups), and paired tests the difference between the paired means in two groups.
An unpaired test could done between two groups of people on a control and test diet and look at weight gain.
A paired test may be done on water pH before and after treatment in a control and test group. If this is done in the environment, the basal pH in different locations may be vastly different, say 5.0 at the beach and 9.0 on the mountain top. Let’s say the control has no effect, but the treatment increases pH 20%.
In an unpaired test of before and after means, that increase may not be significant because your variance is huge.
But the increase of 0% versus 20% could be detectable if it’s a paired test.
I’m sure someone will be along to explain it better or tell me I didn’t recall correctly.
OldGuy didn’t really help because I know what a t-test is so let me rewrite the question.
For t-tests there are two formulas, one if the data is paired like in a pre- and post-test situation and another formula if the group is unpaired like a control and treatment group. My question is why do we need two different formulas. Is it simply that with the paired we can relate one score to a corresponding score (like student A’s pretest and posttest) whereas with unmatched we can only compare the means (this was my original thought the same as dasmoocher)? Or is there something more to it? And could I use the unmatched t-test on a matched pair group and get the same answer? If not, how would it be different?
It seems to me that they are really two different tests with the same name and similar function not just two different formulas for the same test.
There are lots of t tests. Excel has three: paired/dependent, and independent is divided into separate calculations for whether the variances are equal or not. Unequal requires more calculations to correct for increased error in that situation.
With dependent/paired, a large source of variance is accounted for: individual differences and the like. If the data is times for a race, we know that Bob is a slow runner whether or not we give him Brawndo. We know that Alice runs a little faster. But if all the electrolytes work, then both will run noticeably faster. If we only give Bob Brawndo and Alice is control, he might run faster than her. But we have no way of knowing if that was due to treatment of not, thus we have to correct for that.
If you run both tests on the same data (try it!) the paired test will generally give “more” significance (lower p etc.). The dependent has higher power, the ability to detect a difference if it exists.
This won’t work if the sample sizes are unequal, of course, because some of the data is not paired to anything.
The math is much simpler too: paired is the mean difference over the SD of the difference scores. Independent is the difference between means over a multiple step pooled variance.
Such as when you’re comparing, not the heights of a bunch of 20 year olds and an unrelated bunch of 40 year olds, but the heights of the same people at age 20 and at age 40. The second test is a lot more constrained, and the two tests answer different questions.
They are (slightly) different tests, but they both involve test statistics which follow a t-distribution under the null hypothesis (and all other relevant assumptions).
Sorry it didn’t help. The important thing is a t-statistic has a tabulated Student-t distribution. The formulas are simply there for constructing the variable that has this distribution. In both tests the t statistic is the difference between two numbers. But the difference between two random variables has a smaller variance if the variables are positively correlated. The difference in the two test constructions takes this into consideration.