The average of the values is 39. But that’s not correct - the average should account for the missing readings between 2/6/21 6:00 and 2/7/21 6:00 - if you interpolate those missing readings the average is 33 which is what I’m looking for.
Is there a function or method that can do this?
And I’ve simplified with my hourly spacing - the timestamps are quite irregular. I’m looking for something like a weighted average I guess.
might do what you want - it fits a linear curve to the data you have, and then calculates the value of the curve at the mean time (allowing for any typos I’ve made)
Instead of a simple moving average, you could attempt a more general type of resampling where the input impulses are fed into an appropriate low-pass filter. You can use Python or Octave or whatever to design the filter coefficients, and store them in your Excel worksheet once you are satisfied with it.
Two thoughts:
1:
Irregular time spacing is going to make a hash of most Excel functions which are working more on a “collection of samples” mentality. The TREND function specifically has this problem.
They really assume time is not one of the variables. So you may need more processing outside Excel to shape the time-based data into sample-based data. Or use a macro-based solution to insert fake sample rows and generate plausible data values for the faked samples.
2:
You know the nature of the real process you’re sampling, but linear interpolation is only meaningful if that’s how the real data behaves.
Take for example something like outdoor temperature intended to be sampled hourly, but with a bunch of missing samples. Between samples taken at 10am and 4pm on a single day, on most days you can interpolate linearly and not be too far off in those intervening hours.
But if you take a sample at noon on 2 days in a row you’re going to be falsely interpolating a very warm night. Or take two samples at 2am and you’ll be falsely interpolating a very cool day.
As long as your sampling density is greater than the Nyquist rate of the signal you are trying to reconstruct, you should be OK even if the sampling is not at precisely regular intervals. But you still need to know the nature of your process and design things appropriately; if you need to know the hourly temperature variation then sampling it once a day will not be enough.
You haven’t defined what it is you are trying to do with this data. Do you want population statistics (e.g. a mean, variance, et cetera) or do you want a time-varying trend like a running average or a daily mean temperature profile? As LSLGuy notes, if you are going to interpolate values, you really need to understand the mechanism behind the trend, and in any case filling in ‘fictitious’ values is contaminating an experimental data set, which will output whatever trend you are feeding into it.
Ok, you’re forcing me to show the mundane nature of this. It’s from a game, so it is pseudo-random ups and downs of a stock market simulation.
It is the Stock Market mini-game in Cookie Clicker. You can read all about it here, but here is a synopsis of what the game is doing with it’s phony stock prices:
Market Behavior
Stable (12.5% chance)
Slow Rise (25% chance)
Slow Fall (25% chance)
Fast Rise (12.5% chance)
Fast Fall (12.5% chance)
Chaotic (12.5% chance)
So, you’re trying to predict future market behavior based upon the current trend? Then you want some kind of a weighted moving average filter. Given that the data is irregular you’ll have to use some kind of conditional function, or else downsample by averaging over intervals to get a regular time series, e.g. averaging samples over the period of a day so you just have one value per day. This will quickly become cumbersome in Excel because it really isn’t intended to be used for this kind of analysis, and if this is going to be done extensively you’d really be better off moving to a statistical package or using statistical functions in Python/Pandas, R, or Matlab/Statistics Toolbox which have built in functions to do this kind of trend analysis. This is probably biting off more than you expected but if you really want to do exploratory data analysis (which is what you are doing here) then it is well worth investing the time into learning one of these tools rather than trying to force a spreadsheet into being a full-fledged data analysis tool.
No. The behavior is random. Although any one stock is in an unknown mode at any one time, such as “Slow Rise”. I’m not trying to figure that out.
Each of the 16 stocks has a defined “Resting Point”, kind of a gravitational center that it wants to move back to. Similar to how Amazon.com or Berkshire Hathaway are always going to have a higher price per share than other less prestigious stocks. I’m trying to figure out what that is based upon average performance. I also look at Min and Max but was curious about how to average them properly considering I can’t sample at the same time every day or hour.
So I think it’s just a matter of calculating the area under the graph for each interval , summing up those areas and then dividing by the total time period for all intervals.
Let’s say your value is bucks and you have a time period to the nearest hour.
For each space between the time stamps you have the start value and end value and a time period and we will calculate the area under that by making a rectangle and a triangle.
So for the first period
start time is t0 start value is 20
End time is t1 end value is 44
T1 - t0 is 24 hrs
Area is 24 * 20 + ( 1/2 * 24 *(44-20))
So that’s 768 dollar hours ( not dollars per hour)
Dived by 24 hrs and you get an average of 32 hrs for the first interval.
Now we don’t want to average the averages we want to sum up the areas and then divide by total time
So we keep that 768 dollar hours and add the second interval
T2-t1 is 1 hr, start value and end value are 44 to 46 so we get 45 dollar hours for t1 to t1
T2-t3 is 1 hr , start and end is 46 to 45 so that’s 45.5 dollar hours
Total dollar hours is 858.5 and total time is 26 hrs average of 33.02
Should be straight forward in excel, do a running total of elapsed time ( the native excel time units are fine as it will all get divided out if you format the time stamps appropriately) and a column for dollar hours per interval and running total.of that and divide one by the other.