What is the word meaning to fit or create statistics to something after the fact?

What is it called when one attaches (creates) statistics after the fact to explain some sort of curiosity? I know there’s a specific word for it, and I’m at a loss to remember it.

What I’m thinking of for an example is an excerpt from Kurt Vonnegut’s Slapstick. I’ll spoiler it, just in case:

[spoiler]As the new families began to investigate themselves, some statistical freaks were found. Almost all Pachysandras, for example, could play a musical instrument, or at least sing in tune. Three of them were conductors of major symphony orchestras. The widow in Urbana who had been visited by Chinese was a Pachysandra. She supported herself and her son by giving piano lessons out there.

Watermelons, on the average, were a kilogram heavier than members of any other family.

Three quarters of all Sulfers were female.

And on and on.

As for my own family: There was an extraordinary concentration of Daffodils in and around Indianapolis. My family paper was published out there, and its masthead boasted, “Printed in Daffodil City, U.S.A.”

Hi Ho.[/spoiler]
Hopefully I explained what I’m asking well enough, but probably not. My mind isn’t working full well tonight.

President Bush?

sorry

is it ad hoc?

retro- uuuuuuuuuuuuuuuuhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh - something.

I’ve heard it referred to as “massaging the numbers,” or something similar, before.

I remember a long (and boring, it may have actually been fairly short) lecture many years ago about psuedopatterns The gist of the whole thing was that is is always possible to find psuedopatterns in a random string, “after the fact.” Don’t think that’s quite what you want though.

Could it be the Texas Sharpshooter Fallacy?

“The Texas sharpshooter is a fabled marksman who fires his gun randomly at the side of a barn, then paints a bullseye around the spot where the most bullet holes cluster.”

That’s definitely the idea that I want. Don’t know if the word is quite what I was thinking, but it definitely gets the point across, so thank you.

Interpolation.

When an analyst tries different specifications of a model until he reaches a desired conclusion, that is sometimes called data mining.

Then again, data mining also refers to comparing a huge set of variables against your variable of interest: pure randomness will assure that some subset of the variables will be correlated with one another, even if there is no underlying relationship.

Data mining appears to be what Vonnegut was referring to. Baseball statisticians are often accused of indulging in this pastime. Those who scan the country for cancer “hot-spots” will typically find them for similar reasons.

If you find a relationship that appears to contradict your pet hypothesis, it can typically be explained away with an ad hoc explanation.

If you’d like to stick around, you will keep political pot shots out of this forum.

What manhattan said in Heh. (Politics in GQ) still applies.

Don’t do this again.

DrMatrix - GQ Moderator

That’s when you fit a curve between two known points. Just as extrapolating is when you “fit” a curve beyond a set of known points. Not what the OP is talking about.

Instead of ad hoc, I’d say post hoc.

I agree with Measure; the process of digging for statistics to support a desired conclusion is called data mining.

I have to admit I am a little confused by the OP. It has been many years since I took a statistics class, but it seems to me that all statistical analysis by necesity comes “after the fact”. One must first observe and clasify an event or characteristic based on a sample group that is a subgroup of a given population. Then by applying statistical formulas one may predict the future probability of certain events or the probable occurence of a specific event among the population at large based on the observed certainty of its occurance in the sample group.

I fail to see how the text from Slapstick is relevant to the original question. He is not offering any statistics, but mereley stating observed facts. All “X” are "Y, Most “A” are “B”, Some “C” are “D” and so on. Statistics would come into play if he attempted to predict future probabilities concerning “X”, “A” or “C”.

As for the accuracy of statistical analysis, I remember the first thing my first Statistics Prof. said was “Figures lie, and Liars figure.”

On to Philosophy (and there are many on this board more qualified than I to address this subject-Libertarian being just one example, I do not allways agree with his reasoning, and often cannot follow it, but he has clearly forgotten more about Logic than I ever learned).

I think the term many have refered to is Post Hoc, Ergo, Propter Hoc. Loosely translated as “After this, therefore, because of this”. A Post Hoc is a fallacy with the following form:
A occurs before B.
Therefore A is the cause of B.

This still does not seem applicable to either the original question or the text in the spoiler.

Sorry I can’t be of much help, just my 2 cents.

If it was anything hoc, it’d be post hoc.

FWIW, I have never heard of data mining used in this sense. I don’t know the term for the OP’s questions, but it seems to me it’s a case of not disclosing all the facts about the analysis.

Say you have some experiment, and you are looking for phenomona with a p-value of .05, the standard scientific threshold in most fields. If you look at 20 kinds of things about the experiment, on average one of them will be a false postivite, and give you that p-value. (That’s what the p-value means: If this theory wasn’t true, we’d only see this particular result 5% of the time. Since that’s a low number, we assume that the theory is true). Because you have not disclosed all the other things that you looked at that didn’t come out the way you wanted, you are being intellectually dishonest.

Another facet of this is that analysis is done after the fact, but planning should be done beforehand. If you plan beforehand to only look at factor A, and afterwards start looking at A combined with all kinds of other things, you need to be aware of the trap above. If you start off wanting to look at all those factors, the experiment will need to be planned very differently. You need a much higher power, which usually means you need a much larger sample size, to properly examine all those things.

Clear as mud?

Isn’t it called ‘cherry-picking’?

Maybe criminal interpolation? :slight_smile:

Are you thinking of “overfit?”

I’m not sure if it applies specifically to your situation, but I’ve heard the term curve fitting to mean (roughly) the process of making rules to guide future decisions on the basis of past outcomes.

The danger is that your rules will simply “fit the curve” of past outcomes and won’t actually predict future outcomes.

The term is applied alot to the process of generating trading strategies. For example, just because over the past 20 years the price of pork bellies has gone up on Thursdays when it rains in Chile doesn’t mean that you should check the weather in Santiago before putting in your Thursday orders.