All you mathematicians, get your butts in here!

Yesterday morning a series of minor tornadoes went through our area (no serious damage, nobody hurt). My teenage daughter, after spending an hour or so in the morning crouching in the hallways at school, came home from school in the afternoon all agog because everyone had been talking about “The Day”, you know, Hitler’s birthday/Murrah Building/Columbine. “Oooh, spooky, and now here are these tornadoes…” It makes it worse because on this date four years ago, we DID have a serious tornado hereabouts. I tried to explain to her the concept of “probability”, what were the odds against things like this happening, but I didn’t know enough about how people calculate odds to be able to make any headway against “what everybody says”.

So, all you mathematicians–HELP! I’d like to be able to say to my daughter (and all her “buds”) that the odds against a tornado touching down on this particular day, April 20, are <<whatever>>. I’m supposed to be the grownup here, I’m supposed to know things like that. Help me explain this to her.

Or is there really somethin’ spooky goin’ on? :eek:

“Why, sometimes I’ve believed as many as six impossible things before breakfast!” - the White Queen

It’s not so much the odds of these hitting today, it’s the odds of them not hitting on some significant date.
It’s Fred Hoyle’s old golf-ball argument - you hit a golf ball, it lands on a blade of grass. What are the chances of it hitting that blade of grass? Astronomical, so it mustn’t have landed there by chance, there must be something more to it.
Old, old fallacious argument. You can’t retrospectively declare these things - bear in mind the huge number of coincidences that don’t happen every day.


Luther Blisset is Everyman.
So Smile.

Geez, that’s probably not a simple calculation …

But you might be interested in a map in which you can click on your area and see the probability over the past few years for the point you’ve clicked: Time Series of Annual Cycle of Tornado, Wind, and Hail Probability

Or a table of Probability of Tornado Occurrence in the Twenty Most Tornado-Prone States during March


jrf

This day obviously has some odd connections. The bombing of the Murrah Building was exactly one year after the Branch Davidians burned themselves up in Waco. Also, the Revolutionary War started on April 19th. I guess it’s likely to expect that throughout history significant events are going to randomly clump around certain days. Although the bombing was purposely done on the anniversary of Waco.


“It is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own interest.” - Adam Smith

How about listing all the places where there WEREN’T tornados on that day?

Actually, given the population of the world and the geographic spread, there are bound to be zillions of such coincidences. I don’t know how you’d compute the odds of a tornado, since they’re seasonal, but…

The JOURNAL OF IRREPRODUCABLE RESULTS at one point had a list of statistical correlations that were all VERY high, like the number of oysters taken from the oyster fields in Mexico with the rainfall on the plains in Spain (or whatever). The point was that the statistical correlation might lead you to think that two totally unrelated events were related. Statistics and probabilities can only go so far, there are LOTS of coincidences.

Remember the Kennedy/Lincoln list of coincidences?

Exactly. As someone who works with statistics quite a bit it is always annoying to see the common and flagrant misuse of statistics that occur every day. Most of this is because the average person really doesn’t understand statistics that well.

First of all, correlation does not mean causality when there are other independant variables. I learned this in a very early statistics class where the professor statistically “proved” that lipstick causes breast cancer.

Consider the following analysis. Go survey a large number of people who have had breast cancer and find out how many of them use lipstick. Now, go survey a large number of people (of similar age, economic bracket and so on) who don’t have breast cancer and find out how many of them use lipstick.

You will find that many more of the people with breast cancer use lipstick than people who don’t. Therefore, there is a correlation between lipstick use and breast cancer.

Does this mean lipstick causes breast cancer. No. The analysis is flawed. Notice I never mentioned the gender of the people in the survey. The random sample with breast cancer will be mostly made up of women. The random sample of people without the cancer will contain both men and women (actually, slightly more men). Since women are more likely to wear lipstick then men, of course the sample that contains more women will have a higher percentage of lipstick users.

You can do a similar analysis to prove that after-shave lotion causes prostate cancer. The details are left as an exercise for the reader.

OK, this one is fairly obvious. But, this sort of thing happens all the time. By carefully selecting your samples and the variables you select you can imply a correlation between almost anything and anything. And it works for several reasons.

First, most people don’t realize how statistics really work. Second, most people also don’t think much beyond what they are told, especially if what they are being told reinforces beliefs they already have.

In my example above I’m sure at least a few of you didn’t catch the gender ommission in my description. If the conclusion I was reaching was similar to one you already held you would probably have ignored it anyway.

Here’s a better example. Suppose I’m a computer manufacturer. I (obviously) want to sell computers. I do a survey of households with school age children that have home computers and those that do not. I find that a greater percentage of those households with computers have their children go on to college than those who do not. (Let’s say, for example, that 75% of children from homes with computers go on to collges while only 50% of those without computers do.) I start running ads saying that “children with computers in their home are 50% more likely to go to college than those without”.

I probably sell quite a few computers this way. I may even convince the government to buy a bunch of my computers and make them available to schools and homes so more children have access to computers.

Valid survey? No. What is the economic situation of the households in my survey? The higher the household income, the more likely the household is to have a computer. Similarly, the higher the household income the more likely the children are to go to college. (College is expensive, you have to be able to afford it.) What is the real cause; computers or income? Correlation, yes; but causality?

Want me to do the same survey to prove that children in single parent homes are less likely to go to college than those in two parent homes? I didn’t think so.

My point here is that these last two examples are probably topics that at least some of you have opinions on. Possibly strong opinions. If a survey backs up an opinion you already have, you are likely to accept it as true without questioning the details of how the conclusion was reached.

I could go into the psychololgy of statistics as well, but suffice to say that, in general, people attribute more validity to data that supports their point of view and less validity to data that contradicts it. (It is also much easier to cause a person to form an initial opinion than it is to change one they already have, but that is way outside our main topic here.)

However, psychology does lead into my third reason that people accept bad statistical analysis. There is a strong urge in people to see patterns in things. I once participated in a study where dot patterns were projected onto a screen briefly (about a second) then we were asked to write down what the patterns were. Some of them were obvious, but others were more obscure and a few were simple random collections of dots. We saw patterns in the majority of them and a lot of us saw the same pattern in the random dots. The results were interesting.

Tying this back to the tornadoes at the start of the thread… Assume that the occurance of tornadoes is random and evenly distributed through the year (it isn’t) and that an average of, say, six tornadoes occur per year in your area. (I don’t know where Notthemama lives, but that number is probably low for Kansas and high for Alaska, but we’ll use it.)

So, with the numbers above, on any given day there is roughly a 1 in 60 chance of having a tornado in your area. That’s one every other month on the average. Pretty good odds, actually.)

Now, how many historical events do you think I can find for any day of the year? Checking a “this day in history page” for today (April 21) I find that today is supposedly the day Rome was founded by Romulus and Remus (753 BC), that the battle of San Jacinto occured (Texan war of independance, 1841), the Red Baron was shot down (1918), Stalin’s daughter visited New York (1967) and the protests started in Tiananmen Square (1989). Today is also the birthday of Charlotte Bronte, Anthony Quinn and Queen Elizabeth II, just to name a few.

Oh yeah, it’s Good Friday too.

So, if an earthquake hits somewhere today, will it be because of Tiananmen Square? Doubtful. But someone, somewhere will note it. And a few people will believe it.

Finally, to answer the original question. Find the average number of tornadoes (or days with tornado warnings) your area has had over the past few years. Divide that number by 365. That is the rough odds of having a tornado warning on any given day.

There is a list of historical occurances for any given day.

Coincidences happen.

Sorry for the long rambling post. Hope this is useful to someone.


“Sometimes I think the web is just a big plot to keep people like me away from normal society.” — Dilbert

Thomas Sowell (the economist and conservative columnist) is fond of quoting a Dutch study that found a correlation between the stork population in the Netherlands and the human birth rate.

Work is the curse of the drinking classes. (Oscar Wilde)

The odds of getting 6 out of 6 numbers in New York lottery is (according to the lotto people) 1:18,009,460. So what are the odds that on Wed Apr 19, 2000 they drew 08, 11, 14, 36, 44, 48? Well that is exactly what they drew. Weird, huh?


Virtually yours,

“Feynman was wrong.
I understand Quantum Physics completely.
Anybody seen my drugs?” - A WallyM7™ .sig

Amusingly enough, you have exhibited the error you note earlier in your post. Since specific weather conditions cause tornadoes, you are omitting an independent variable in your analysis. Depending on how carefully you measure the weather conditions, the chances for tornado to occur on any given day ranges from near certainty to near improbability.

A better measure, since day-of-year correlates well with seasonal weather conditions, is to note how many tornadoes historically occurred on each day of year.
If you measure 100 years, and you find 10 tornadoes occurred on, say, April 21, then it’s a pretty good bet you have a 10% chance of getting a tornado on that particular day.


Dr. Crane! Your glockenspiel has come to life!

That’s why I called it “rough odds”. When I first mentioned calculating tornadoes that way earlier in the post I said:

I know there are seasonal variations due to prevailing weather but trying to take weather conditions into account made things more difficult than I really wanted to deal with so I took the easy way out.

What you are doing is what we call (where I work anyway) a year-over-year analysis. I think this is how the Farmer’s Almanac does their forecast. Anyway, to make it even more accurate instead of taking a simple average over the last x years you would need to compute a trend line to account for changing climatic conditions. (To account for local effects due to construction or land clearing, Global Warming, El Nino and that sort of thing.)

If I was doing a real analysis of this the way I would do it would be to note all occurances of tornadoes within a x mile radius over the last y years and determine what the weather conditions were for the previous z days prior to the tornado warning (with x, y and z being determined during the data analysis stage). I would then determine over the same y years each occurance of the weather patterns over the z days and determine which of them lead up to a tornado and which of them didn’t and from that determine that z days with a given weather pattern leads to tornadoes n percent of the time. I would then blend this number with a year-over-year trendline and probably a monthly and seasonal factor as well. (And I would still fail miserably at weather forecasting.)

But yes, you are right, as stated my answer does fall into the same trap I noted earlier. Glad to see you were paying attention. :slight_smile:

Actually, it’s amazing how many people in the field fall into the same trap. Where I work I’m the senior analyst on a project which attempts to forecast customer patterns up to 15 months in advance. We do quite well; our impact is measured in millions per month.

A few years ago our summer forecasts for Atlanta were off. Badly. After spending hours in an analysis session trying to figure out the problem someone suddenly said “Wait a minute… Weren’t the Olympics in Atlanta a few years ago?”. This was quickly followed by the sound of many people pounding their heads on the table. Lots of embarrased people that day. (We added an adjustment factor for our summer 1996 numbers and everything was suddenly much better!)

So, its an easy trap. But I plead innocent in this case since I knew my numbers were off and had stated my simplifying assumption earlier.

And that’s my story and I’m sticking to it. :stuck_out_tongue:


“Sometimes I think the web is just a big plot to keep people like me away from normal society.” — Dilbert

Er, excuse me, but …

The Murrah building bombing was (intentionally) on the same day as the Branch Davidian fiasco - that day was April 19.

The Columbine attack was (intentionally) on the same day as Hitler’s birthday - that day was April 20.

If there’s some conspiracy of fates here, they’re having coordination problems. :slight_smile:

In my world, ‘black’ and ‘white’ are merely extremes in the spectrum of ‘grey’.

Michael Shermer’s excellent book Why People Believe Weird Things goes into some detail on why people are willing to believe stuff that just doesn’t hold rational water. Some of the reasons have been mentioned above, and are tied directly to how our brains work (the finding of patterns, for example). Here are some other possibilities.

In Part 1, Chapter 3, Shermer lists several “errors of thought” that lead people to erroneous conclusions. For example, there’s the dictum, “Heresy Does Not Equal Correctness,” which might be described as the “they all laughed at so-and-so, and he was right” explanation.

The applicable concepts in this case are, as I see it, the following.

Theory Influences Observations. If you firmly believe something beforehand, your analysis will tend to be skewed. (Compare in Gould’s Mismeasure of Man how skull volumes were inconsistently calculated by packing grains more or less tightly between races.) If you start with the idea that the universe “makes” things happen in a related way, you’ll filter out the counterexamples. Selective memory is very, very powerful.

Anecdotes Do Not Make a Science. Should be self-explanatory. This contributes greatly to selective memory.

Burden of Proof. Your daughter has asserted a wildly unlikely phenomenon. It’s not up to you to disprove it; it’s up to her to prove it.

Rumors Do Not Equal Reality. Related to “Anecdotes” above.

And the most relevant principles:

Coindicence and Representativeness. Related to selective memory. Consider: Every now and then, you’ll think of a friend of yours, and occasionally you might even reach for the phone to call that person – and the phone will ring, and it’ll be that person. Evidence of psychic powers? Hardly. You just don’t remember the ten thousand other times you thought of that person and they didn’t call you that instant.

Hasty Generalization. Shermer defines this as an “improper induction,” that is, you have a couple of data points, and you incorrectly use them to extrapolate a larger truth. Bigotry uses this a lot.

The Need for Certainty. The world is big, and complicated, and resists analysis. Part of our lizard-brain functioning demands patterns for survival, so whenever we think we’ve figured out something important about how the world works, we tend to latch onto it.

And the grand finale:

The Unexplained Is Not Inexplicable. Your daughter wants to know why this very strange bit of synchronicity might have happened. You aren’t able to provide an instant answer, calculating the probability of weather phenomena as coinciding on particular dates. She incorrectly reads this as proof for her position, and comes away with a supernatural hypothesis.

If you couldn’t tell, I have very high regard for Shermer’s book. You don’t even need to worry about the specific probabilities and calculations to identify where one’s thinking has gone wrong, or the traps that one is likely to fall into when casting about for an answer. You just need to know how the human brain functions, and you can point to these fallacies as a sort of meta-explanation, instead of being sucked into the specific-information debate.

Check out Shermer’s book. It’s a short course in rational thinking. Highly recommended.


Movie Geek Central – Reviews, news, analysis, and more! http://moviegeek.homestead.com

In response to Cervaise, no, of course the human brain isn’t rational. If it were, we might make the right decision a little more often, but it’d take so long to make that decision that we’d be effectively paralyzed. The (illogical) rules that we do, in fact, use to make decisions may give the wrong answer sometimes, but they’re much faster than the rigorous line of reasoning. What makes a person truly wise, is knowing when to use the heuristic, top-of-the-head answer, and when to use reason.


“There are only two things that are infinite: The Universe, and human stupidity-- and I’m not sure about the Universe”
–A. Einstein

Another common misconception is that if the lottery goes over X dollars (where 1/X is the chance of a single $1 ticket winning) then buying a lottery ticket has a positive expectation. According to a former co-worker, even professional statisticians make this mistake.

The reason for this misconception is left as an exercise for the reader. :slight_smile:


Dr. Crane! Your glockenspiel has come to life!

I can think of two off the top of my head…

  1. Lottery prize are (in the vast majority of cases) not lump-sum distributions. The money is given out over a (usually) 20+ year period. Therefore, you are comparing the present value of the cost of the ticket ($1) against the future value of the lottery payoff (x, in your example.)

  2. Any single person is not guarenteed to have the entire winnings to themselves. The winnings may be distributed between two (or more) winners, which reduces x.

Any others?


“Sometimes I think the web is just a big plot to keep people like me away from normal society.” — Dilbert