Help with statistics - cancer cluster?

Four women working together were diagnosed with breast cancer in the same year. Assuming that this is a random occurrence, and that the likelihood of an individual woman being diagnosed is 1/173 per year (from here), how do I calculate the odds of this happening?

Four women out of how many total in the workplace? You can’t calculate the probability without that information.

I don’t know. Say the total number is n.

It’s not easy to do by hand. Basically, this is a binomial probability problem. There are four quantities you need to know:

n = the number of trials. (In this case, this is the number of women in the workplace… i.e., the total number of women who could have gotten cancer.)

x = the number of “successes”. (The terms “success” and “failure” are typically used to refer to the two outcomes of a binomial procedure, with “success” generally being used to denote the outcome you’re trying to find the probability of. In this case, “success” is cancer, so x is 4 – the number of women who actually did get cancer.)

p = the probability of “success” on any given trial. (In this case, p is 1/173, or approximately .00578.)

q = the probability of “failure” on any given trial. This is simply 1 - p.

To find P(x) – the probability of getting exactly x “successes” – you use the binomial probability formula:

. . . . . . . .n!
P(x) = -------------- * p^x * q^(n-x),
. . . . .(n - x)! (x!)

where * means multiplication and ^ means exponentiation. (Note also that zero factorial equals 1 by definition.) (Sorry about the dots – it’s the only way I could find to make everything line up.)

But what you really want is the probability of four or more “successes”. The way to do this by hand would be to find P(3), P(2), P(1), and P(0), add them together, and subtract the result from 1. (Then, if you really want odds rather than a probability, you can easily convert.)

As I said, this would be pretty tedious to do by hand. Fortunately, these days it’s not hard to find technology that will calculate binomial probabilities for you.

Thank you for that.
Statistics was never my strong suit.

I found an online calculator that seems to do what I want.

Looks like, for a group of 20 women, the probability of this occurring is 1:203,000 which is far greater odds than I would have thought.

Thank you again for pointing me in the right direction.

ETA: for all the pedants out there, I realize that this is not a rigorous treatment of the problem, since the ages of the women, race, and medical histories have not been controlled for.

These factors are all rolled into that 1/173 number. You need to know exactly who that 1/173 applies to.

Well, in this case, I’m sure that the 1/173 probability doesn’t apply to all women in the office, since that figure is for 40-year-olds. But, it’s close enough.

There’s another problem, here. Why did you pick that particular set of four women? You almost certainly picked them after the fact, because they all happened to get cancer. But there are a very large number of possible sets of four women you could have looked at, and most of them didn’t get cancer.

Put another way: Even completely random processes end up with clumps, somewhere or another. If you pick out the location of a clump based on the fact that it has a clump in it, then the fact that there’s a clump there tells you absolutely nothing.

I know that.
The daughter of a friend of ours was commenting that four women in her medical office all were diagnosed with breast cancer in the same year, and that she thought it was because of the x-ray machine in the office. I wanted to find out just how unlikely it was for this to occur just by random chance. I said that while it would probably be worth having the x-ray machine(s) inspected, it seemed unlikely that they were the cause, since typically x-ray induced cancers take many years to develop.
But, as you note, people see patterns everywhere…