Polynomial Regression help - Calculation of co-efficients

While there is a way to avoid doing redundant work when 99 of the 100 points will be the same from calculation to calculation, my programmer’s intuition is that reworking your code to do so will produce minimal performance improvement. Normally the matrix inversion is what takes the most amount of time, and that won’t be affected. Speeding up one part of your program that takes, say, 10% of the computation time by 99% is less effective than speeding up a part that takes 80% of the computation time by a mere 15%. Before making further modifications to your code, I highly recommend profiling your code to see what parts are taking the largest amount of time. That being said…

As Omphaloskeptic notes, in constructing the matrix X and the vector Y, it doesn’t matter what order the points are in as long as you’re consistent. You can think about the matrix X[sub]i[/sub] as being equal to the matrix X[sub]i-1[/sub] but with the vector (1 x[sub]i[/sub] x[sub]i[/sub][sup]2[/sup] … x[sub]i[/sub][sup]n[/sup]) substituted for the vector (1 x[sub]i-k[/sub] x[sub]i-k[/sub][sup]2[/sup] … x[sub]i-k[/sub][sup]n[/sup]) where n is the degree of the polynomial you’re trying to calculate and k is the number of points you’re sampling to generate that polynomial. Instead of doing the full matrix multiplications of X[sub]i[/sub][sup]t[/sup] * X[sub]i[/sub] and X[sub]i[/sub][sup]t[/sup] * Y[sub]i[/sub], you can compute how swapping the vectors will make those differ from X[sub]i-1[/sub][sup]t[/sup] * X[sub]i-1[/sub] and X[sub]i-1[/sub][sup]t[/sup] * Y[sub]i-1[/sub]. But you’re still left with inverting a matrix.

thanks for your help and earlier input. It is interesting stuff.

I was toying around with the numbers, and realized that for computing a polynomial of order n, each entry at row i, column j (0 <= i,j <= n) in the matrix X[sup]t[/sup] * X is equal to the sum of x[sup]i + j[/sup] across all sampled points, and the vector X[sup]t[/sup] * Y is equal to the sum of y * x[sup]i + j[/sup] across all sampled points. So replacing point point (x[sub]k[/sub], y[sub]k[/sub]) with point (x[sub]m[/sub], y[sub]m[/sub]) is simply subtracting x[sub]k[/sub][sup]i + j[/sup] and adding x[sub]m[/sub][sup]i + j[/sup] to each matrix entry and subtracting y * x[sub]k[/sub][sup]i + j[/sup] and adding y * x[sub]m[/sub][sup]i + j[/sup] to each vector entry.

Yes, this is the technique I alluded to earlier, allowing you to avoid recalculating X[sup]t[/sup]*X and X[sup]t[/sup]*Y each time. (This is also the method used by old calculators to do least-squares regression without much memory.)

But unless your fits are to a large number of data points, just doing the multiplications should not be very time-consuming.