I’m engaged in a discussion at work that is considering growth trends for some numbers we measure. A coworker, some years ago, stated the growth trend has been 9.8% per year. Based on this, another coworker calculated a 5-year prediction, compounding the 9.8% each year.
However, I’ve just looked at the data for the last 7 years. I plotted the data on a graph and to my eye the growth looks linear. If so, compounding will cause the prediction to be too high. But it’s only 7 years, so maybe any exponential curve is too slight to see.
Is there a way to mathematically determine whether a bunch of data points trend linearly or exponentially?
Assuming it’s linear, it’s easy to calculate the slope and from that calculate any projections. What if it is exponential? How do I calculate the percentage?
A quick-and-dirty way to answer this question is to take the logarithm of each data point, and plot that vs. time; if the data is truly exponential, then you should get a straight line. In other words, if the data actually follows the trend x(t) = a * e[sup]b t[/sup], for some constants a and b, then plotting x’(t) = ln x(t) should give you x’(t) = ln a + b t, which is a straight line. You can then figure out the growth rate of the exponential from the slope of this line. You can even do a linear fit to both x’ and x and see which fits the data better.
Be advised, though, that this technique is rather back-of-the-envelope, and unless there was a substantial difference in the quality of the two linear fits, I’d view any answer you get as tentative. There are more sophisticated statistical techniques out there to answer this question, which I really should be more versed in but unfortunately am not; hopefully a real statistician will happen by shortly.
Neter et al. (the bible of basic linear models) give a test for lack of fit, but it’s only applicable when there are repeated observations, and (IME) the data has to be pretty damn non-linear for the test to reject a linear model. And you shouldn’t be using regression models anyway, as you’re dealing with repeated observations at different times.
That said, here’s something Q&D you can try if no one else comes up with the right answer. If your data really is linear, then the error in predicting the dependent variable from the independent variable shouldn’t correlate with the dependent variable. So if you run a correlation, and they do correlate, you’ve got some evidence that it’s non-linear. If they don’t correlate, I don’t think you can conclude anything.
Bivariate linear regression will give you a linear fit to your data, against which you can compare the covariance or correlation coefficent and its associated probability. This wouldn’t actually mean that the data are linear in the sense of falling close to a line, but that there is a linear trend in the distribution. Here is a simple explanation of linear regression by the popular method of least squares.
If by eyeball the data appears linear, then frankly it probably is, though with a dataset of seven points a more complex trend may not be apparent either by eye or curve fitting. With an alleged fit that compounds at almost 10% a year, though, I’d expect to see a distinctly nonlinear trend almost immediately.
I’m ashamed to say that I don’t follow your reasoning here. Why is regression fitting the data inappropriate?
The standard linear regression model is y[sub]i[/sub] = [symbol]b[/symbol][sub]0[/sub] + [symbol]b[/symbol][sub]1[/sub]x[sub]i[/sub] + [symbol]e[/symbol][sub]i[/sub], where each [symbol]e[/symbol][sub]i[/sub] is a normal random variable, and [symbol]e[/symbol][sub]i[/sub] is independent of [symbol]e[/symbol][sub]j[/sub]. Time series data are notable for violating that independence assumption, and the linear model is not robust with respect to it. I guess it’s a little strong to say that you can’t use regression models, but you shouldn’t be using the standard model, as it won’t work well.