Statistical / metrics analysis question

background: A fundraising company signs people up for a monthly donation program. We (a non-profit)pay the company a fee upon the donor’s first payment, and collect the monthly donation (electronically, independent of the fundraising company) once a month for as long as the donor remains in the program. The donor can cancel at any time, but if we have collected one payment from the donor, we cannot obtain a credit for the fee paid to the fundraising company. It takes approximately seven to nine months for the amount collected to recoup the associated fees and processing costs.

Is the program worth it? It is only a few months old, so we do not have very much of a history to work with, but as time goes on our data will take on more significance. One of the tools we’d like to use is the average number of months donors stay with the program. I am turning to the TM in hopes of finding some way of getting numbers that have some value.

It is a fairly simple exercise to calculate the average number of months active. But I am not sure this number is very meaningful. I have folks who cancel after three months, and folks who have been with the program for three months and have no intention of canceling. Do I assign different weights to each category? What do I use for the weights? We will eventually have people renew their agreement at the end of the year- do we keep averaging in 13, 14, 15 months? Should I keep track of two different averages- the avg. number of months active, and the avg. number ‘till cancellation?

As you can see, I really don’t have much of an idea as to where to go from here. I am pretty sure I can’t just take a simple average, but don’t know which path will take me to the right answer. Any suggestions?

Thanks,

Rhythmdvl

I think you’d be fine with a simple arithmetic mean. Basically, what you want to know is (correct me if I’m wrong): Does the total cost exceed the total revenue of the program?

As long as that’s the case, an unweighted mean will do fine. All you have to do is multiply this average by the number of particpants, times the individual donation, and you’ve got a good revenue estimate.

The problem, as I think you’ve gathered, is that it’s hard to say how long someone is going to last before they’ve quit. I mean, averaging the number of months till quitting by all the quitters doesn’t take into account the loyalists.

I think maybe you could come up with a number of different models. First, average the number of months each quitter lasted. Then assign an arbitrary guess at the lifespan of each non-quitter. Maybe start with a figure 50% higher than the lifespan of each quitter. Average them together (weighting each particpant equally; probably not weighting the sum of quitters equally with the sum of loyalists).

This give you two questions (you give me one, I give you two; what a big help I am :stuck_out_tongue: ). 1. Is the program worthwhile with this loyalist lifespan estimate? 2. Is this estimate realistic?

What you are looking for is a “drop out” equation (sometimes called extinction studies. I doubt this is linear. You might find something like 40% drop out (stop donating) within 6 months and 80 % drop out after 24 months or whatever. This is not linear.

One possible way would be to set up a likelihood parameter. Sort of like “If a donor has lasted 3 months, how likely are they to drop out before the 9 month recoup timeframe?” You can do the same for 6 months or whatever. You would have to use prior data, assuming you have some.

However, drop out studies are common. I imagine some articles have been published about this. Try ERIC or other journal search program. I don’t know if any studies deal with the sign up companies, but if the donor doesn’t know they exist, I don’t see how their (the company) involvement would affect patterns.

Oops. My percentages were backwards.

Further, you should consider the increase in number of donors, however, this is not incredibly important if none of them stay long enough for you to reap the benefit. Work the extinction angle first.

In that the relationship is not linear, I would avoid the mean.

The average time in for quitters is a reasonable idea, but, unless you have prior data, this would be tough. This does, however, get back to the idea that you have to look at quitters. they are the ones who will determine if the program is worth it.

It goes like this.

The program is worth it if…

  1. H[sub]0[/sub]: donors stay more than 7 months (recoup time) longer (unlikely that the program will be responsible for this unless they somewow find more dedicated donors).

  2. H[sub]0[/sub]: you get an increased number of donors who stay longer than 7 months (possible, more donors mean more money, but one might expect the extinction rate to be similar in both groups - or even higher in the solicited donor group [as opposed to the volunteer donor group]).

  3. H[sub]0[/sub]: the number of solicited donors who stay longer than the recoup time donate enough to cover the money lost by solicited donors who stay less than the recoup time. This is the most likely scenario for study.

The most important issue is whether you wish to engage in a strict statistical test, or whether you wish to simply engage in data mining. If you’re data mining, then you’re just looking at what is a likely conclusion, and there are no hard and fast rules to follow. If, however, you want a solid statistical conclusion, then there are strict rules to follow.

  1. Decide what you expect the distribution to be. This is where you have to guess, so without any previous experience this is going to be a large source of error. Probably the simplest guess would be that it is exponential: that is, each month a member will have probability p of canceling, where p is constant.

  2. Decide on a statistic. In this case, the most importent issue is the average number of months spent in the program. So you should probably use that as your statistic.

  3. Decide on a method of calculating the statistic. If you use the exponential model, someone who is in the program and hasn’t canceled can be expected, on the average, to stay in as many more months as a new member can be expected to stay in total. So after addding up the total number of months that everyone spent in the program, instead of dividing by the number of people in the program, divide by the number of people that cancelled. Call this number t bar.

  4. Decide on a decision criterion. Here you need to decide what probability of being wrong (that is, having a type I error) you’re willing to accept (this probability is called alpha). You can then calculate what the acceptable range for the statistic is by finding the value of p such that the probability of getting a type I error is less than alpha.

  5. Take data, calculate t bar, and decide whether that is in the acceptable range.
    As for the issue of what to do with people who have been in for more that a year, that depends on whether there is a fee charged to you for a renewel. If not, then you should make no distinction between numbers more than twelve and less than twelve.

With a (discrete) exponential model, the expected number of months that I calculated was (2p-1)/p, which doesn’t make sense because it can be negative. I’m still trying to figure that out.

Rhythmdvl, I think the data you’ll want to look at is the fraction of people who are around after X months, timing every person from when they started, not when your company began the program. Divide the number of people around after X months by the total number of people who started at least X months ago. So if X = 6 months, the people who started 5 or fewer months ago don’t contribute to either total.

What you’ll get is a curve which starts at 1 for X = 1, and tapers off as X increases. This is something you could try fitting with an exponential, or whatever the shape suggests to you. Once you have a model, you can estimate your return by integrating the curve you have of the actual percentages, using the model to extrapolate the data to higher X than you have data for.