Trendlines in an Excel graph

I am trying to analyse some data (website visitor numbers) using an Excel graph, and am exploring the trendline function at the moment. The data has been fairly consistent over the last three months (apart from a dip over the festive season), between 80 and 120 visits per day during the week, and between 40 and 60 visits per day over the weekend. What I am wondering is which of the trendlines that Excel provides (linear, logarithmic, polynomial, power, exponential, moving average) would best suit this sort of data.

I’m not a statistician at all, and am hoping that there will be someone out there who can help…

Many thanks
Grim

What are you trying to graph? How number of visitors changes over the weeK? It is entirely dependant on what you want to show. They say there are three kinds of lies, white lies, damned lies, and statistics, and it is the many possible ways that data can be interpreted that suggest that. Without more info I don’t think we can help you, except to say I think you definately do not want to use linear… if I understand what you are trying to do at all. Fill us in and I will try to help.

I’m hoping to show some sort of general trend (to be used for prediction of future usage) in the data, but because of the massive variation between weekday usage and weekend usage, the graph looks like a mountain range and it is difficult to (visually) see what trend(s) there might be…

Sample of data:


Date	 Total
01/11/2004	106
02/11/2004	121
03/11/2004	110
04/11/2004	101
05/11/2004	95
06/11/2004	59
07/11/2004	60
08/11/2004	129
09/11/2004	105
10/11/2004	87
11/11/2004	93
12/11/2004	102
13/11/2004	59
14/11/2004	68
15/11/2004	98
16/11/2004	104
17/11/2004	101
18/11/2004	119
19/11/2004	100

The graph as it looks at the moment.

Thanks
Grim

Given the large variations within each week, I’d just graph weekly totals. However, there also appear to be some seasonal differences, particularly with low figures in late December. If there are seasonal variations, then you need to look at more than a year’s worth of data to decide if there is any long-term trend.

Right click on the graph and select “Add trend line”. A moving average might work well for you but it’ll depend on what the data points are meant to represent.

In Office 2003, the feature Grey describes is used by clicking the graph, then clicking the Chart menu and choosing Add trendline…

When I first saw your data, I thought as Giles did, assuming you were using US dates, and this was data across a year. But now I can see your date format is dd/mm/yyyy. The high amount of variation isn’t so unexpected across a week.

If you think that weekends and weekdays are different, you should be tracking them separately. You should also do a test to see if there’s evidence that they really are different. My books are at home, so I’ll get more detail later.

Actually if you just right click on a data point in the graph you can get the Trendline option.

Actually if you just right click on a data point in the graph you can get the Trendline option.

That makes sense - I’ll give it a try…

Thanks - any advice would be much appreciated.

It is the Trendline function in Excel that Grey mentioned that I am playing with…

Grim

What the Excel trendline does is produce an equation that best matches the chosen data. But you can’t trust it blindly, and you have to use it with care. If in doubt, go to the “Options” menu and check the “Display equation on chart” and “Display R-squared value on graph”. The equation will be for the trendline, and the R-squared value is an indication of how accurately the trendline fits the original data. R-squared = 1 is a perfect fit; I find that anything less than 0.97 (say) isn’t quite good enough for my purposes.

You have to choose the appropriate type of equation too. If it’s a straight line, choose linear. If it’s a curve with one knee, try logarithmic, power, exponential and polynomial(=2) and see which one gives the best R-squared value. If you’ve got 2 knees or more, choose polynomial, and make the polynomial order = (number of knees +1).

Looking at your graph I think you need to reorder the data before you can find the trends you’re looking for. Making the distinction between weekdays and weekends is a good suggestion. What you’re really looking for is a moving average to be able to predict future trends. Be careful what you set the period to, it will affect the result (low period gives closer tracking, high period gives smoother trendline). If you’ve got fluctuations during the week, how about splitting the data into 7 sets, one for each day? You could overlay them on the same graph - that’d give you quite a lot of information in one picture.

I forgot about this thread last night, but I’ll email myself a reminder to look at it tonight.

The concern is that, unless the weekends and the weekdays have the same underlying distribution, it’s possible that their trendlines will be different. That makes your analysis a bit tougher, especially if you don’t discover it.

Fridgemagnet, one of the first things that was mentioned about R[sup]2[/sup] in my intro linear regression class is that low values don’t necessarily imply bad models. Unfortunately, MS Excel doesn’t supply a lot of the tests that you’d use to do regression, so that may be about as good as it gets.

Ultrafilter is quite correct, you won’t get a high R-squared value from grimpixie’s data as there’ll be quite a high random component. I’m an engineer who’s had to use Excel trendlines in anger to measure crucial coefficients in electronic characteristics, where there’s always an underlying equation that obeys physical laws. For this, an R-squared that’s as close to 1 as possible is absolutely necessary. Grimpixie’s data may have an underlying trend, but it’ll have a large random element too, and so here the R-squared value won’t be quite as meaningful.

I defer to ultrafilter’s greater knowledge of stats. We were never introduced to the concept of “intro linear regression” at school, yet alone an entire class.

Grimpixie - any chance of linking to the raw data? It’d be interesting to crunch it and see what pops out.

And besides, I think Excel, while decent in some respects, is grossly underpowered to handle the sort of data analysis you need.

Sorry about not getting back to this thread earlier, but I was busy last night.

Let m[sub]1[/sub] be the mean number of visitors over the weekdays, m[sub]2[/sub] the mean number of visitors over the weekend days, n[sub]1[/sub] the number of weekdays, n[sub]2[/sub] the number of weekend days, s[sub]1[/sub] the standard deviation of visitors over the weekdays, and s[sub]2[/sub] the standard deviation of visitors over the weekend days.

The estimator for the difference in means is m[sub]1[/sub] - m[sub]2[/sub], and the standard error is sqrt(s[sub]1[/sub][sup]2[/sup]/n[sub]1[/sub] + s[sub]2[/sub][sup]2[/sup]/n[sub]2[/sub]). There’s a 95% probability that the true difference in means lies in the interval (m[sub]1[/sub] - m[sub]2[/sub] - .4013 * sqrt(s[sub]1[/sub][sup]2[/sup]/n[sub]1[/sub] + s[sub]2[/sub][sup]2[/sup]/n[sub]2[/sub]), m[sub]1[/sub] - m[sub]2[/sub] + .4013 * sqrt(s[sub]1[/sub][sup]2[/sup]/n[sub]1[/sub] + s[sub]2[/sub][sup]2[/sup]/n[sub]2[/sub])).

Here’s a link to an excel file with the data in - crunch away Fridge

ultra - just :eek:

Grim

I ran the test, and the approximate 95% confidence interval for the difference in means is (41, 43). It’s very unlikely that the two datasets are from the same underlying solution, so you need to analyze them separately.

I, erm, made some squiggly lines in pretty colours:

[image noborder]http://members.lycos.co.uk/fmagnet/hpbimg/websitevisittrendlines.gif[/image]

The clearest results with Excel and grimpixie’s data seem to come from using a moving average of 7 days (given that there’s periodic dips at the weekend, 7 days is a good figure). Just for amusement I’ve included a trendline using a 6th order polynomial (the highest order that Excel allows). You can see from the equation that it really only needs to be 4th order, but that sort of trendline is meaningless for this type of data anyway. Predicting the future with it will just get you into trouble!

I did a bit of other crunching as well, just to see what emerged. I split the weekends and weekdays, and though the weekends have fewer visits they do kind of predict the weekday ahead. I guess info for your site goes out on a Friday/Saturday? Here’s the [A HREF=“http://members.lycos.co.uk/fmagnet/hpbimg/websitevisitstats.xls”]Excel file[/A]. You can see the effect of choosing a larger rolling average - it smooths out the trendline, but doesn’t track the raw data as well.

A pox on Guest mode! Sorry, you’ll have to follow the links the hard way

Fridge’s links: Graph; Excel sheet

(format for linking is {url=whatever}text{/url} but with square brackets instead of curly, and we can’t post images - I don’t think being a guest has that much to do with it)

Thanks for all that - looks like you had fun!!! :slight_smile:

I think the explanation for the weekend dips is that this is a site that most people access from/for work-related information. The data gets updated on a Friday afternoon…FWIW. We’ve recently changed the structure of our site, and what was a separate site with its own URL has been incorporated into the main site, although the URL still redirects to the proper place - these are the stats for those visitors accessing the pages through the old URL, and I was wanting to see whether there was a drop off as people found thier way around the new site, or whether the old (memorable) URL was still retaining its popularity.

Looks to me as if the general trend is gradually downwards…

Grim