What is the term for this kind of statistical mistake?

Absolute · January 13, 2013, 9:33pm

Let’s say there is an industry of companies that make widgets. There are actually two types of widget, fluttering widgets and oscillating widgets, but the difference in functionality is subtle and not appreciated by most consumers, or anyone else. Most companies make both kind of widget anyway, and they are sold in roughly equal numbers.

After receiving scattered reports of deaths due to widget use, the government commissions a study, and said study concludes that the use of widgets leads to a 40% greater chance of a heart attack in both the widget user and anyone nearby.

Widgets are quickly banned.

However, in reality, the risk of a heart attack is a peculiar consequence of the mode of operation of fluttering widgets only. Fluttering widgets in fact increase your risk of a heart attack by 80%, yet oscillating widgets are perfectly safe. However, because no one appreciated the significance of the two different types of widget, the study did not even gather data on which deaths were associated with which type of widget, and the most accurate statistical correlation was never discovered.

Is there a name for this kind of statistical error? “Choosing too broad a sampling variable”, or something?

Saint_Cad · January 13, 2013, 9:41pm

I would say ill-defined variables. As for the fallacy name, I would say this is a subset of hasty generalization.

Absolute · January 14, 2013, 1:57am

Is it really as simple as that? It is not simply an ill-defined variable, it is a failure to recognize that a variable exists.

Maserschmidt · January 14, 2013, 2:21am

Error in model specification?

Saint_Cad · January 14, 2013, 2:45pm

I don’t think that it is lack of recognition of the variable’s existence. For example, crooked tails are common in siamese cats so I do a study of 100 cats (8 of which are siamese) and get 8% of my sample has crooked tails. Are you claiming that I do not recognize that there are siamese and non-siamese cats? Instead I made my variable too general (all cats).

ultrafilter · January 14, 2013, 4:29pm

It sounds similar to either the ecological fallacy or the fallacy of division, but it’s not exactly the same as either.

cerberus · January 14, 2013, 5:36pm

It does not have a fancy name. It is a matter of not accounting for reasonably plausible sources of variability, in this case type of widget. In practice one stratifies or adjusts for these sources.

FasterThanMeerkats · January 14, 2013, 5:40pm

Simpson’s Paradox, wikipedia link

Average of total vastly different from average of subgroups.

Absolute · January 14, 2013, 6:45pm

Thank you for that. Following links on that page, I found the following, which I think is closest to the situation I described:

ultrafilter · January 14, 2013, 6:56pm

It’s definitely not Simpson’s paradox, which is referring specifically to the situation described in the very first sentence of the article you linked to:

cerberus · January 14, 2013, 7:17pm

Right, Simpsons requires the aggregate effect to flip relative to the subsets.

The error here is not considering the effect of widget type in the analysis.

The idea is that your setup is valid if you’ve correctly specified the underlying variables in an appropriate and complete manner, including the right variables and not including the wrong variables.

Model miss-specification covers many problems.

Absolute · January 14, 2013, 8:41pm

Would you agree that “Omitted-variable bias”, linked above, is an accurate description?

cerberus · January 14, 2013, 9:01pm

The problem here is the omission of widget type in the analysis. Variable omission is the specific fault here.

I prefer a broader term encompassing incomplete or excessive adjustment to
multiple terms describing every particular error, though. Too much jargon.

ultrafilter · January 14, 2013, 9:21pm

It’s an omitted variable issue for sure, but the word “bias” has a specific technical meaning in statistics that doesn’t fit here.

Topic		Replies	Views
Type I vs Type II error: can someone dumb this down for me Factual Questions	16	93425	April 17, 2012
Goddammit, learn the difference between correlation and causality!! The BBQ Pit	32	1984	July 29, 2002
Abusing statistics for fun Miscellaneous and Personal Stuff I Must Share	24	3289	January 24, 2011
Medical errors #3 cause of US deaths Miscellaneous and Personal Stuff I Must Share	24	2448	May 7, 2016
Is Snopes wrong? (Statistical Significance) Factual Questions	69	6400	December 30, 2015

What is the term for this kind of statistical mistake?

Related topics