How do I know if a statistical effect is significant?

I’m afraid you are misinformed. In statistics, the “population” is almost never just the small set you have in front of you. Rather, the population is the set of all potential students that might ever theoretically take the class. How do you know the difference in classroom GPA is not due to a series of random coincidences that put an unusually talented (or not) set of students in a single classroom? That is exactly what you are asking when you try to determine whether the difference between two classrooms is statistically significant.

I know what question people are asking when they want to know whether an effect is statistically significant, so I’m not sure why you’re pointing that out. And I get your point that there is a larger theoretical population out there than just the students in that class. However, since 1) that larger population is not strictly defined, and 2) since the students that took the class are not a random sample from that larger random population, the use of generating significance statistics and presenting them might be severely limited.

I’m not saying that there’s a good alternative, but that does not mean that we should rely too heavily on significance numbers to tell us which effects are real and which ones aren’t. In my view, the concept of statistical significance is overused in social analysis and applied in places where it is not appropriate, suggesting a degree of certainty about the outcomes of statistical analyses that is unwarranted.

Learning statistics can be hard or easy, depending on your training material (e.g. video tutorials, online articles, textbooks, a formal class, etc.) and your aptitude for math.

I do great at math (it’s just learning how to move the numbers around), and have had two different experiences with intro to statistics classes. My takeaway from that is to never rely on a textbook older than you are.

If you do want to learn statistics, I’d suggest searching online (e.g. youtube, vimeo) for some introductory videos unless you’re more of the written type, in which case you can probably find plenty of articles or tutorials about it.

Do NOT attempt to use wikipedia! For anything scientific wikipedia is a garble of formulas, technical jargon, and irrelevant details to anybody not in the relevant field.

This question can be answered even if we have data from the whole population (over space and time, the wholey-woley population*). Well, I assert it thus; if it isn’t true, can someone tell me why?

*But possibly not the whole modal population.

Whole population = I know for certain.
Sample with criteria met like sufficient power = I am pretty damn certain. Low p and all that.

It’s like Zeno’s paradox. You might never know for certain, but the chances of something random happening could eventually equal your chances of winning the lottery.

So is Svejk right that statistical significance is not an applicable concept here? Is his or her view the mainstream view among people who do this kind of thing? Or is he making an argument against mainstream practice? Or what?

Svejk seems to be making two arguments, the second of which makes more sense than the first. In his second post he argues that statistical tests rely on certain assumptions that may not hold true in this situation; this is possible, although that doesn’t mean that the tests are useless or that you can’t take the possibility of violation of assumptions into account when you draw your conclusions. The argument in his first post, that inferential statistics aren’t even applicable because you’re not dealing with a sample, strikes me as just plain wrong (if that’s what his argument actually is – I may have misunderstood it).

It’s not that it’s not applicable, it’s that you shouldn’t read too much into the fact that a relationship is statistically significant.

The reliance on statistical significance to indicate true causation is a well known issue, and I agree with Švejk in that, especially with social sciences, there is a tendency to latch on to statistical significance, often at expense of a more reasoned and thoughtful methodology.

The idea is that if enough studies produce enough consistent results, then a relationship can be assumed until evidence indicates otherwise. Things like p-values, confidence intervals, power analyses are all just tools, pieces of evidence to help guide a researcher. In graduate school, this is well observed and these concepts are downplayed, but the truth is that audiences for journal publications want a quick conclusion, not a long discussion of methodology, analysis, and results.

FWIW - I was able to teach myself R using mostly “R in a Nutshell”. I had tried to learn Visual Basic and Python in the past, but failed (as well as some other languages when I was younger).

Most people seem surprised by this, and there is a steep learning curve - but I wouldn’t shy away from R until you give it a shot.

Also found the examples here to be useful in giving me motivation to learn more.

http://gallery.r-enthusiasts.com/

This is true. I’ve always thought about the distinction between biological significance and statistical significance. From my perspective, statistical significance is necessary but not sufficient for me to even spend any time thinking about some particular result.

You think your results are “trending” towards significance? Don’t care, and don’t waste your breath trying to explain anything to me. Come back when you have data good enough to care about. You have a statistically significant result? Now you can explain to me why I should care about it.

Years ago, I was testing the effect of particular environmental and genetic factors on the lifespan of my favorite critter. The “wild-type” life span always varied between assays, and the effect was extremely (statistically) significant using standard tests in the field. But so what? All that tells me is that there are factors beyond my control that change from week to week, and that I needed to carefully control for this variation when determining the effect of a particular treatment.

Getting back to the OP, let’s say he finds a statistically significant difference between class A’s 3.2 GPA and class B’s 3.3 GPA. So what? Is that difference meaningful enough to, say, decide whether to promote a particular teacher? Will that difference make a measurable difference in the lives of the students? That’s where you can no longer rely on statistical theory.

Well, what I was saying is that he’s not dealing with a random sample. I acknowledge that there is a larger population, but at the same time this population is not clearly delimited, and the cases in the sample were not randomly drawn from the theoretical population.

My issue is this. Statistical significance says something about whether something found in a sample is likely to exist in the larger population as well. If you don’t really know what that larger population is, then you have a problem. If your sample is your population, then obviously what you’re doing does not make much sense.

Let’s say, for instance, that you’re doing an analysis in which all US senators are cases. Your N is 100. You identify two groups within those 100 senators and compare the two according to some criterium. The first 50 score 2.1, the other 50 score 2.4. I contend that it makes no sense to say anything about whether this difference is statistically significant or not - your sample is your population, so even if the effect were tiny and marginal, it still exists and cannot be a product of sampling error in the first place. If you feed these data into SPSS or STATA, however, the computer will gladly produce some p-values for you, and you can then say that the difference between groups A and B is statistically significant, when this is completely meaningless from a statistical point of view.

Two more points: 1) I did not say that you can’t use inferential statistics. If you want to do an ANOVA test or calculate a Chi-Square or a Pearson’s Rho or whatever, you can do that, and try to meaningfully interpret the results. I’m just saying that calculating significance is a red herring here. 2) It seems that the analysis the OP suggests is unlike the senatorial one I outlined, but it still is not a random sample and I don’t think statistical significance is of too much use in figuring out whether a 3.2 and a 3.3 on a 4-point scale are meaningfully different.

In this example, I think a lot depends on what the scores are supposed to represent. If it’s something like height, which is directly measurable with no measurement error, then yes, the difference is what it is. But suppose it’s a multiple-choice test that’s supposed to measure knowledge of the Constitution. Each senator’s observed score will be the net result of two components – their “true” knowledge of the Constitution, and an error component, which is the result of things like the particular set of items chosen to be on the test, the senator’s luck in making guesses on items he/she isn’t sure of the answer to, how much sleep the senator got last night, etc. These errors are distributed “randomly”. Thus, the difference in observed scores is what it is, but if you are trying to go from that to saying that one group of senators is more knowledgeable about the Constitution than the other – that is, that their true scores differ – you are making an inference, and statistical procedures are applicable.

Bolding mine. Not sure what you’re saying here, or why you’re saying it.

  1. What do you mean by statistical procedures, which seems incredibly broad and inspecific?
    If you mean statistical significance, then no - it’s not applicable, it’s not going to tell you how much of the difference is caused by measurement error. You are right in pointing out that measurement error might cause that difference and that that would be a reason to distrust the difference if it is small. But you can’t make that call on the basis of statistical significance, because it has nothing to do with measurement

  2. What do you mean by ‘inference’? And what does it have to do with whether or not ‘statistical procedures’ are applicable?
    Do you mean causal inference? I would be engaging in causal inference too if I were measuring something completely reliably and validly with 0 per cent chance of measurement error, and went on to interpret the results causally. Any causal narrative always relies on inference of some kinds - sometimes more plausible than others, but inference nonetheless. But not all inference has to rely on statistics, and even less inference has to rely on making statements about statistical significance. That’s just not what statistical significance does for us. Also, there’s statistical procedures such as factor analysis that are not applicable for testing causal relations of any kind.

Sorry to be vague. Instead of “statistical procedures” and “inference” I should perhaps have said inferential statistics and statistical inference. (As the cite says, these refer to the drawing of conclusions about systems affected by random variation, including observational error.) It is true, of course, that not all observational error is random; systematic observational error (bias) is a different animal.

The standard statistical procedures which are being discussed in this thread are appropriate for designed experiments with controls and blinding and proper randomization and all that jazz. The description of the data in the OP seems to suggest that this an observational study, which is considerably harder to analyze.