Polynomial Regression help - Calculation of co-efficients

Punoqllads · May 31, 2011, 9:08am

While there is a way to avoid doing redundant work when 99 of the 100 points will be the same from calculation to calculation, my programmer’s intuition is that reworking your code to do so will produce minimal performance improvement. Normally the matrix inversion is what takes the most amount of time, and that won’t be affected. Speeding up one part of your program that takes, say, 10% of the computation time by 99% is less effective than speeding up a part that takes 80% of the computation time by a mere 15%. Before making further modifications to your code, I highly recommend profiling your code to see what parts are taking the largest amount of time. That being said…

As Omphaloskeptic notes, in constructing the matrix X and the vector Y, it doesn’t matter what order the points are in as long as you’re consistent. You can think about the matrix X[sub]i[/sub] as being equal to the matrix X[sub]i-1[/sub] but with the vector (1 x[sub]i[/sub] x[sub]i[/sub][sup]2[/sup] … x[sub]i[/sub][sup]n[/sup]) substituted for the vector (1 x[sub]i-k[/sub] x[sub]i-k[/sub][sup]2[/sup] … x[sub]i-k[/sub][sup]n[/sup]) where n is the degree of the polynomial you’re trying to calculate and k is the number of points you’re sampling to generate that polynomial. Instead of doing the full matrix multiplications of X[sub]i[/sub][sup]t[/sup] * X[sub]i[/sub] and X[sub]i[/sub][sup]t[/sup] * Y[sub]i[/sub], you can compute how swapping the vectors will make those differ from X[sub]i-1[/sub][sup]t[/sup] * X[sub]i-1[/sub] and X[sub]i-1[/sub][sup]t[/sup] * Y[sub]i-1[/sub]. But you’re still left with inverting a matrix.

libertas · May 31, 2011, 12:12pm

Punoqllads:

While there is a way to avoid doing redundant work when 99 of the 100 points will be the same from calculation to calculation, my programmer’s intuition is that reworking your code to do so will produce minimal performance improvement. Normally the matrix inversion is what takes the most amount of time, and that won’t be affected. Speeding up one part of your program that takes, say, 10% of the computation time by 99% is less effective than speeding up a part that takes 80% of the computation time by a mere 15%. Before making further modifications to your code, I highly recommend profiling your code to see what parts are taking the largest amount of time. That being said…

As Omphaloskeptic notes, in constructing the matrix X and the vector Y, it doesn’t matter what order the points are in as long as you’re consistent. You can think about the matrix X[sub]i[/sub] as being equal to the matrix X[sub]i-1[/sub] but with the vector (1 x[sub]i[/sub] x[sub]i[/sub][sup]2[/sup] … x[sub]i[/sub][sup]n[/sup]) substituted for the vector (1 x[sub]i-k[/sub] x[sub]i-k[/sub][sup]2[/sup] … x[sub]i-k[/sub][sup]n[/sup]) where n is the degree of the polynomial you’re trying to calculate and k is the number of points you’re sampling to generate that polynomial. Instead of doing the full matrix multiplications of X[sub]i[/sub][sup]t[/sup] * X[sub]i[/sub] and X[sub]i[/sub][sup]t[/sup] * Y[sub]i[/sub], you can compute how swapping the vectors will make those differ from X[sub]i-1[/sub][sup]t[/sup] * X[sub]i-1[/sub] and X[sub]i-1[/sub][sup]t[/sup] * Y[sub]i-1[/sub]. But you’re still left with inverting a matrix.

thanks for your help and earlier input. It is interesting stuff.

Punoqllads · May 31, 2011, 6:25pm

I was toying around with the numbers, and realized that for computing a polynomial of order n, each entry at row i, column j (0 <= i,j <= n) in the matrix X[sup]t[/sup] * X is equal to the sum of x[sup]i + j[/sup] across all sampled points, and the vector X[sup]t[/sup] * Y is equal to the sum of y * x[sup]i + j[/sup] across all sampled points. So replacing point point (x[sub]k[/sub], y[sub]k[/sub]) with point (x[sub]m[/sub], y[sub]m[/sub]) is simply subtracting x[sub]k[/sub][sup]i + j[/sup] and adding x[sub]m[/sub][sup]i + j[/sup] to each matrix entry and subtracting y * x[sub]k[/sub][sup]i + j[/sup] and adding y * x[sub]m[/sub][sup]i + j[/sup] to each vector entry.

Omphaloskeptic · May 31, 2011, 6:34pm

Yes, this is the technique I alluded to earlier, allowing you to avoid recalculating X[sup]t[/sup]*X and X[sup]t[/sup]*Y each time. (This is also the method used by old calculators to do least-squares regression without much memory.)

But unless your fits are to a large number of data points, just doing the multiplications should not be very time-consuming.

Topic		Replies	Views
Curvefitting by sums of various powers of x, y etc Factual Questions	39	3202	June 11, 2011
Correlation coefficient in polynomial regression Factual Questions	12	8760	February 20, 2004
I need help with approximation (math). Factual Questions	12	1063	March 16, 2005
Statistics (Functions) Factual Questions	18	1251	October 25, 2000
Casio Calculators & Linear Regression Factual Questions	2	1648	March 11, 2003

Polynomial Regression help - Calculation of co-efficients

Related topics