Polynomial Regression help - Calculation of co-efficients

I suppose I should start off and state that I am 43 and this is not homework help.

I need some help understanding how to calculate the co-efficients of a polynomial regression in the formula y(x) = ax^2 + bx + c

I’m writing a PHP program for fun and want to be able to plot a polynomial best fit curve based on input data. I can do linear, exponential, logarithmic, etc, but I’m stuck when it comes to a 2nd order polynomial.

All of the stuff I have googled and read up on confuses me with its terminology, so if you can respond in a way that assumes I’ve forgotten all of my calculus it would be appreciated.

Let’s take some simple data to commence.

x             y
1             100
2             50
3             33
4             25
5             20

If use Excel, I get the following formula:
y = 7.0714x2 - 60.929x + 150.6

and a confidence level of:
R2 = 0.9776

What I simply cannot undestand is where 7.0713, 60.929 and 150.6 are coming from.

Can anyone tell me, in plain English, what I need to do to get these co-efficients?

If you google “Polynomial Regression” you’ll find plenty of sites that explain this. There are also plenty of books, for the hard-copy enthusiast. I learned mine out of Philip R. Bevington’s data Reduction and Error Analysis for the Physical Sciences, which does a good job, although there are plenty of others.
Basically, you’re using the method of least squares * to determine which polynnomial “best” fits the data you have. Imagine that you have your data points as physical points on a board – imagine that they are nails, physically driven into the board. Now imagine that you’ve chosen a set of coefficients and made a rigid wire model of the polynomial function and placed it on this board. Now attach rubber bands or springs between the model and the data points. The method of least squares is the one that will vary those coefficients to find the one that has the overall smallest amount of stretching in the springs (in terms of the smallest amount of potential energy, added up over all the springs). It effectively changes the coefficients to get the best result.

I’d go into this more, but that would start bringing in the math, which you can get elsewhere. But this “least square” method (which does the spring trick by adding up the squares of the distances between the points and the function, albeit only in the vertical direction) does the best job of drawing what we intuitively think of as the “best fit” between data points and data. It only really does a good and believable job if the function matches the data pretty closely. See a book like Press et al’s Numerical Recipes for information on when the fit is believable and when it isn’t.
It’s good to understand it, rather than relying on a canned least square program, because then you can do variations like weighting the individual points, or writing least square routines for situations where you have a specific fitting function in mind (like you think it ought to be a sinusoidal variation. Or you believe that it ought to be a quadratic with no linear term, or something like that)

*Of course, the model’s a little unphysical – the springs have to be constrained to move only vertically, and the model has to stay in place without shifting left, right, up, or down, or rotating. And when you begin varyingm, you’re varying the coefficients, which changes the shape of that rigid model. But you get the idea.

You can write out the formulas for least squares regression without any kind of advanced math, but if you know a little bit of linear algebra, there’s a trick you can use to make all kinds of complicated regression pretty straightforward. Let X be a matrix with all of your input variables, and let y be a matrix with all of your output variables. Then the least squares weights w are given by (X[sup]t[/sup]X)[sup]-1[/sup]X[sup]t[/sup]y, and the predicted value for the observation x[sub]i[/sub] is just the dot product of x[sub]i[/sub] and w.

The nice thing about this formula is that even though it’s derived for linear regression with no constant term, it can be used to handle a large class of regression problems. All you have to do is to add the appropriate columns to X. In the case of quadratic regression, each row of X is of the form [1 a a[sup]2[/sup]].

If you can do a linear fit, do one for the terms x and x^2. That is, create an estimate in terms of x and x^2. If it helps you see what you’re doing, create a new variable w which you set equal to x^2, and then do a linear fit in terms of x and w.

Thanks for the replies.

I’ve spent the last two days looking further into this and I remain (unfortunately) as confused as I was when I started.

I don’t understand this from ultrafilter: (X[sup]t[/sup]X)[sup]-1[/sup]X[sup]t[/sup]y

Nor do I understand this from Napier: create an estimate in terms of x and x^2

Is there anyone who could take the sample data and show me the maths required as step-by-step examples?

It’s been too long since I did something as complex as this and I need a bit of hand holding.

The aim of a least-squares fit is to define a function f(x) such that the sum of the square of the distance between f(x[sub]i[/sub]) and y[sub]i[/sub] for each data point (x[sub]i[/sub], y[sub]i[/sub]) is minimized. To fit a second-order polynomial fit, then, you want to find the coefficients A, B, and C for the equation f(x) = Ax[sup]2[/sup] + Bx + C. You find the formula for the sum of the squares of the difference and from that generate 3 new formulas: the partial derivative with respect to A, B, and C. You set those 3 equal to zero. You now have 3 equations and 3 unknowns, which you can then solve for.

Using your example, you want to minimize the equation
(A + B + C - 100)[sup]2[/sup] + (4A + 2B + C - 50)[sup]2[/sup] + (9A + 3B + C - 33)[sup]2[/sup] + (16A + 4B + C - 25)[sup]2[/sup] + (25A + 5B + C - 20)[sup]2[/sup]

Taking the derivative of that equation with respect to A, B, and C gives you the three equations:

2(A + B + C - 100) + 8(4A + 2B + C - 50) + 18(9A + 3B + C - 33) + 32(16A + 4B + C - 25) + 50(25A + 5B + C - 20) = 0
2(A + B + C - 100) + 4(4A + 2B + C - 50) + 6(9A + 3B + C - 33) + 8(16A + 4B + C - 25) + 10(25A + 5B + C - 20) = 0
2(A + B + C - 100) + 2(4A + 2B + C - 50) + 2(9A + 3B + C - 33) + 2(16A + 4B + C - 25) + 2(25A + 5B + C - 20) = 0

Which simplify to:
1958A + 450B + 110C = 2994
450A + 110B + 30C = 998
110A + 30B + 10C = 456

And solving those 3 for A, B, and C gives you (according to Matlab):
A = 7.0714285714287, B = -60.9285714285719, C = 150.6000000000008

Which is pretty much what Excel gave you.

Oh, and explaining what ultrafilter said, if you know linear algebra, you construct two matrices X and Y where each row of X is (1 x[sub]i[/sub] x[sub]i[/sub][sup]2[/sup]) and each row of Y is y[sub]i[/sub]. So, for your example, you get the matrices:

X =
1  5 25
1  4 16
1  3  9
1  2  4
1  1  1

Y =

The notation X[sup]t[/sup] denotes the transpose of X, i.e.

X[sup]t[/sup] =
1   1   1   1   1
5   4   3   2   1
25 16   9   4   1

Using the rules of linear algebra:

X[sup]t[/sup] * X =
 5  15  55
15 55  225
55 225 979

(X[sup]t[/sup] * X)[sup]-1[/sup] =
 4.60 -3.30  0.50
-3.30  2.67 -0.429
 0.50 -0.429 0.0714

X[sup]t[/sup]  * Y =

(X[sup]t[/sup] * X)[sup]-1[/sup] * X[sup]t[/sup]  * Y =

Which, again, are the coefficients of the best parabola to fit your data.

Punoqllads, you are my hero.

From everything I’ve read so far, your first explanation makes perfect sense.
Seeing “how” it is done with real numbers helped to clarify a lot of the theory I’d been recently reading.

I’m not quite sure on the 2nd explanation, so I’ll go read up some more on that and see if that translates to a simpler (in code) approach.

Thanks to all who replied, it has been really helpful.

I thought this was a math forum but it is not. Anyway, maybe someone can answer this:

the matrix X and Y as given in this example seem to be filled upside down. The array x is defined as:

yet X is filled as:
1 5 25
1 4 16
1 3 9

also Y is turned upside down. Why?
Then I have a more specific question:

  1. I have a data array with for instance 1000 data points.
  2. I calculate a higher order fit only for 100 datapoints within that array.

This is straightforward.
Now I want to calculate the fit for these 100 datapoints but run it over the entire array of 1000 points and save the end values of the 100 data point fit. So you calculate the fit of the first 100 points of the 1000 point array, save the end values, then move 1 point, calculate it again etc, over the entire array.

This is simple to implement but very slow. I am wondering if there is not a trick for this so that you do not have to calculate the entire 100 point fit every step of the iteration.


I recommend you get your hands on a Numerical Analysis textbook. The techniques you are looking for are covered in most of them, including the one by Burden and Faires (partially available on Google Books) that I’m familiar with. Chapter 3 covers polynomial interpolation, while Chapter 6 covers solutions of linear equations. The book includes pseudocode for numerical algorithms, as do most books on this topic.

thanks. I was hoping that someone with advanced math skills could confirm that my question is possible. With tha I mean that whether a math technique is available that uses results of a prior fit to calculate a new fit in the case as I have shown that you calculate a moving fit and save the end points of each fit during each iterative calculation.


I don’t really have the skills to help you, unfortunately. I just took a beginner course on this as part of my degree requirements, and your initial question sounds like a few of the techniques described in the book. I’m not sure I even understand what you’re asking here, though it makes me think of the Divided Differences techniques to find an interpolating polynomial (I could be way off base…as I said, I’m not even sure what you’re looking for).

Sorry! Good luck!

The order in which the data points (rows of X and Y) appear is not important, as long as they appear in the same order in each one.

If I understand correctly you want to perform the first fit using data points 1-100, the second fit using points 2-101, the third using 3-102, and so on; in each case the fit you want is the one described by ultrafilter and Punoqllads, but you just want to find the results more efficiently than by performing all of these calculations each time.

I’m not sure what you mean by “save the end values of the 100 data point fit” though.

There is a relatively simple technique that lets you avoid having to recalculate X[sup]t[/sup]*X and X[sup]t[/sup]*Y for each fit; you still have to compute the matrix inverse (X[sup]t[/sup]*X)[sup]-1[/sup] for each fit, however. Whether that is a large improvement depends on the dimension of your fit and on the number of data points per fit. I can describe it if that’s what you are looking for.

Unless you have really stringent time or processor constraints or a very long datapoint window, though, it’s hard to see this taking a long time even if you do it the brute-force way. (For example, on the computer I’m using now, MATLAB did 901 100-datapoint quadratic fits, as I think you’re describing in your question, in 70 milliseconds.) What reasons do you have to think that this optimization needs to be done?

Here it sounds like you might want the fit from points 2-101 to depend on the fit from points 1-100, which is different from how I read your earlier post. If you want that then you have to describe how the later fits depend on the earlier ones.

thanks for all the replies, will read in more detail tomorrow. But indeed Omphaloskeptic this is what I want to do. Calculate a fit from point 1-100 and save the value of the fit and the error at point 100 in another array(s). Then shift 1 point and calculate the fit from 2-101 and save the value (fit + error) at point 101 in separate arrays. So if your array is 1000 points it has to do this fit + shift 900 times.

The reason is that I am programming a fit for trading stocks. There is an indicator called the COG (center of gravity indicator), see for a chart:


this is basicly a least squares fit of the financial data of the last 100 or 150 points. But only the last point is of use. The chart looks great but for every new data point that comes in the chart repaints. Therefor I want to calculate static points for the historic prices so I can do some trade setup tests.

But I basicly know already that what I want is not possible.

to explain a bit more I made a chart.


the middle chart shows a 3-rd order fit for the last 100 data points. The chart below that shows the last points of this fit iterated 100 times each time 1 point shifted. Since in real time only the last point is a valid point to use for the sake of backtesting I would need to calculate this. But it is very slow for say 1000 points or more.

A few questions which will help in optimization:[ol][]How are you doing this calculation (what programming language, what kind of computer)?[]How slow is “very slow” (as I said earlier, a similar calculation only took me tens of milliseconds, and not on a particularly fast computer)?[*]How do you know that it’s this calculation that’s the slow part (as opposed to, I don’t know, I/O, or drawing the graph, or whatever else you’re doing)?[/ol]

If you want to do a linear model for time series data, you should be looking into ARMA models, and maybe add in some GARCH components as well.

[quote=“Omphaloskeptic, post:17, topic:432840”]

A few questions which will help in optimization:[ol][li]How are you doing this calculation (what programming language, what kind of computer)?[]How slow is “very slow” (as I said earlier, a similar calculation only took me tens of milliseconds, and not on a particularly fast computer)?[]How do you know that it’s this calculation that’s the slow part (as opposed to, I don’t know, I/O, or drawing the graph, or whatever else you’re doing)?[/ol][/li][/QUOTE]


I am programming in a package called Amibroker on a i7 computer. A simple example with 5 points. For each step I add a new data point and remove the one not used.

step 1
1 2 3 4 5 polynomial fit
5 store last point of fit

step 2
2 3 4 5 6 polynomial fit
6 store last point of fit

step 3
3 4 5 6 7 polynomial fit
7 store last point of fit


so what I was hoping is that when going from step 1 to 2 one would not need to calculate the polynomial fit entirely by using segments of the fit calculated in step 1. But I think that this is not possible, it has been confirmed by someone strong in math.

to calculate this can be very slow when working with intraday data. For instance minute data has around 24*60=1440 bars of data each day. If you want to calculate the “static fit” as I described for a few weeks of data (e…g 2 weeks = 2 * 5 * 24 * 60 = 14400 bars) then you can imagine it has to calculate a fit of 100 points 144 times. For 6 months of data it will have to do this fit close to 1750 times. It slows things down :slight_smile:

thanks will have a look