Help with statistical testing

I am conducting using a Likert Scale that is not ordinal but interval. Say the questionnaire is:

On a scale of 1 - 5 with 5 being most important, how important are the following subjects for life after high school:

Math
English
History
Science
PE
Electives

Since the data is an interval scale, I imagine that I can use mean and standard deviation and not an ordinal test like Wilcoxen rank-sum

So let’s say after compiling my data, I get:

Math m=3.1 s=1.1
English m=3.8 s=0.9
History m=1.3 s=0.6
Science m=3.5 s=1.7
PE m=1.2 s= 0.6
Electives m=3.9 s= 1.3

How would I analyze this to say “these subjects are considered imprtant and a significance level of 0.05.” or is there a different analysis/conclusion I should work towards?

I would just use analysis of variance. The importance is the response of interest, and the subject would be the treatment of interest. If everyone answers every question, then you could block on the respondent. Then you could do multiple comparisons to see which subjects are significantly different from one another. It would be close enough for government work.

ANOVA, I will assume by your data that the repeated measures/within-subjects version is best as every participant answered for all school subjects. If the assumption of normality is violated (there’s a test for that), then a non-parametric test like the aforementioned Wilcoxon rank-sum would be good instead (it’s not just for ordinal). However, Wilcoxon won’t work because it only allows 2 groups I believe, and also a within subjects test would work best? In that case, something like “Friedman repeated measures analysis of variance on ranks” would be best.

Any decent statistics package (e.g. SPSS) will do the ANOVA you select and test normality, then if it fails, apply the appropriate nonparametric test.

If I understand correctly, you have a single group of participants rating a number of topics, and you want to know whether one topic is rated more or less highly than another. This means that “topic” is a within-subject variable (each participant rates each topic, so topic varies within each individual, not across individuals). So you can’t use a regular ANOVA, because it isn’t designed to handle within-subject variables. Instead, you would use multivariate ANOVA (specifying topic as a within-subject factor, and no between-subject factors), or its nonparametric analogue, the Friedman ANOVA. Probably the main criterion for deciding which you should use is how skewed the distributions are. There’s no hard and fast rule for how much skewness is too much… generally, if skewness is below -1.5 or over 1.5, a parametric test (such as ANOVA) may have problems (some people would use -1.0 and 1.0 as the cutoffs). In such an instance, you can either opt for a nonparametric test, or try a transformation (such as square or square root) that may bring the data closer to normal. (You would have to apply the same transformation to all the ratings.)

A significant result on the multivariate ANOVA or Friedman test would simply tell you that one group was different from another – not which group, or in what way. To answer that question you have to do pairwise comparisons (paired t-tests or Wilcoxon tests).

That said, I’d still probably start with an ANOVA type test so as to bound the overall false positive rate with respect to multiple comparisons.

So suppose I have 12 groups to compare. What would the results of an ANOVA test look like as opposed to all 66 possible t-tests?

All an overall ANOVA tells you is that that at least one group is different from the others. Then you would run post hoc tests to look at all interactions that you’re interested in. Just running t tests is a bad idea because it will create all kinds of errors. With 66 comparisons, even if there are ZERO differences between groups, odds are that at least three differences will appear by chance (66 * 0.05 = 3.3). T tests should only be used if you only want to do very few of them, and they are planned in advance.

The ANOVA post hoc tests will control for this error. There are dozens of tests to choose from. One commonly used test is the Tukey test, which allows all pairwise (e.g. group 2 vs. group 4) comparisons. It is not recommended for complex comparisons (e.g. average of (groups 2 and 3) vs. group 4 alone).

If your overall ANOVA shows no significant results, then pairwise comparisons are unnecessary as all groups are equivalent.