This could be a GQ, but the answers will most likely be opinion rather than fact.

The issue: I want to average a series of terms which are evenly time-spaced, but I want the weight of the average to be biased toward more recent values (give less weight to older values).

An example might be a system that measures your vehicle’s fuel economy every second and displays an average that is closer to what it is right now, giving less weight to the values from 100 miles back.

I jabbed at numbers and came up with a linear decay algorithm that seems to work across the board, for numbers greater than 1: where “n” is the number of terms and “m” is the decay rate, the first term is multiplied by factor “f” as in

f = ( m[sup]n[/sup] / ( m[sup]( n + 1 )[/sup] - n )*( m - 1 )

in which the second term is multiplied by f / m , the third term by f / m[sup]2[/sup], on to the last term multiplied by f / m[sup]n[/sup] (or, more simply, f factors a value and is sequentially divided by m for each subsequent value, becoming smaller as it moves down the list until the sum of the factors reaches 1).

All well and good. It should work for any temporally regular ordered set of numbers. My quandary lies with how to determine the decay rate m. I feel that it should be some number between 2 and 1 (2 is a pretty high decay rate) that is somehow derived from the number of terms n, but I cannot decide whether/when m should increase or decrease as the list gets longer.

It seems to me that the nature of the data and your goals with it will have a big influence on the “best” approach.

You are presumably averaging sequential points because a single measurement is too noisy. If so, how noisy is each measurement, and how much less noisy would you like your estimate to become through averaging?

You are presumably de-weighing older points because the value of interest changes with time. What it the typical rate of change? Are there times where things can change dramatically very quickly, and is that important to capture? For gradual changes, is the change typically correlated in either sign or magnitude from one measurement to the next, or not?

Alternatively, for the fuel economy question, the approach depends on one’s goals. If we want to report a running estimate of the overall fuel economy for a given journey, then the best treatment would need to treat older and newer data together. If we want to report an estimate of instantaneous fuel economy (e.g., to convey how bad flooring the accelerator pedal is), then we should only look at the most recent measurement, unless there is a noise issue that we want to tamp down.

So, I don’t think there’s a general guiding principle here. It will depend very much on the details of the application.

(Aside: I didn’t understand your formula’s goal, as it appears to give positive or negative values for f depending on what I set n and m to.)

Well, averaging reduces a dataset to a number, so, yeah, that would be less noisy.

It is a general application approach. I have not really delved into dealing with outliers or trend lines. Mostly, the single result is what is of interest.

You have to set n to a non-zero positive integer, because it defines the number of terms in the list. m can have a value of 1 or greater – less than one causes this algorithm to go off the rails; if you wanted reverse decay, giving more weight to older terms, you would use a workable decay rate and run the sequence from back to front.

Again, this is a pretty basic thing, as I am not an actual statistical analyst by trade.

But is that the goal? If so, the optimum weighting scheme depends on how much noise there is from point to point vs. how much true underlying variation there is from point to point, and on whether you care more about reducing noise or emphasizing instantaneous readings or capturing recent typical behavior or … .

There’s not going to be a general answer here. I’m not trying to dodge the question but rather looking for more details, because the answer must be tied to the context.

Your fuel economy example is an easy case whether the broader context matters. A hybrid vehicle, for instance, might report instantaneous fuel economy, some rolling average over the past five minutes, a refuel-to-refuel average, or a vehicle lifetime average. These are all interesting for different purposes.

The formula appears to be missing a parenthesis, but assuming that doesn’t change the middle factor, then (as an example) m=1.2 and n=10 yields a negative value for f.

In Statistical Process Control, we often use the classic EWMA - the Exponentially Weighted Moving Average. It gives less and less weight to the data as they age. A single parameter, lambda, between 0 and 1, is selected to decide how rapidly older data are discounted:

Another thing you could do (I agree that what you should do very much depends on what you want to achieve) is regard your moving average as a sort of low-pass filter, and tweak the parameters to produce the desired frequency response (e.g., cut-off frequency).