Linear least squares optimization

Dr.Strangelove · December 12, 2015, 1:36am

I need to compute a set of coefficients c that best minimizes e for:
x[sub]n[/sub] - e = c[sub]1[/sub]x[sub]n-1[/sub] + c[sub]2[/sub]x[sub]n-2[/sub] + … + c[sub]l[/sub]x[sub]n-l[/sub]

I have quite a lot of data points (millions) which I can easily access in C++, though (at least with my skill level) not MatLab. I’ve been able to perform a decent analysis in MatLab on short sets of data points, but I’d like to go further.

The algorithms for doing this analysis seem to involve pretty heavy matrix math, such as a [n, m] * [m, n] matrix multiply.

The answer doesn’t have to be perfect, but I want good coverage of my data set. I don’t want to have to load everything into memory at once if possible. Anyone know of techniques that converge on the right answer given enough samples? A progressive technique that operates one sample at a time would be optimal (especially if it works in a single pass).

Also, are there any techniques that minimize a different falloff function, such as exponential? Minimizing the squares isn’t necessarily optimal for my purposes.

septimus · December 12, 2015, 10:10am

This isn’t my area of expertise but the key step for this minimization is just to invert a lxl matrix where l is the number of coefficients. The millions of data points need not be present in the memory; just make a single pass through them to build the matrix.

I’ll e-mail you the C code I use if you wish.

Derleth · December 12, 2015, 1:55pm

God gives burdens, also LAPACK.

Main download site. It ships with C APIs, which you can use from C++.

Example code.

Derleth · December 12, 2015, 2:06pm

More example code.

Intel’s documentation.

As for not loading everything into RAM, I dunno. Buy an SSD and transfer the data to that? There’s probably a clever solution which is elegant, efficient, and wrong the first few dozen times, which is why we lean on library code so heavily in the first place. Sidestepping the problem with hardware is a reasonable option at this point.

Derleth · December 12, 2015, 2:17pm

This is probably a bit opaque even to CS geeks.

I mean, load the data onto an SSD, mmap(2) it all into your address space, and let the OS take care of moving pieces of it in and out of RAM with the normal page cache mechanisms. That’s why we have OS kernels, to abstract away these kinds of details. It will entail a performance hit relative to the tuned-and-optimized code you haven’t written and probably shouldn’t try to write, but if the data’s on an SSD it shouldn’t be that bad.

(mmap(2) is a Unix system call. Windows has… something… hand-wave… check MSDN.)

Dr.Strangelove · December 12, 2015, 8:25pm

Aha! That you told me it was possible led me to the solution.

I was thinking that I needed to do the A[sup]T[/sup]A multiply in memory, but it’s really just the repeated sum of a matrix with x[sub]i[/sub]x[sub]j[/sub] as elements. That’s easy enough, and from there I can do the inversion and the rest. Thanks!

Dr.Strangelove · December 14, 2015, 4:08am

BTW, I meant to thank you as well, even if I didn’t use that technique. Windows does in fact support memory-mapped files (and my SSD happens to be particularly fast).

I ended up coding it all up in C++, using the Eigen library for the matrix math. Nice and easy to use. The results match Matlab (more or less), so I’m pretty confident in the results.

I do find the results very curious. I was expecting the coefficients to go down roughly monotonically with their distance from the predicted sample. And this is true to a broad extent (the closest two samples dominate), but there are some interesting exceptions.

Topic		Replies	Views
Polynomial Regression help - Calculation of co-efficients Factual Questions	23	10228	May 31, 2011
Reverse engineering the coordinate system by distance Factual Questions	19	1431	April 17, 2018
Statistics (Functions) Factual Questions	18	1252	October 25, 2000
Curvefitting by sums of various powers of x, y etc Factual Questions	39	3222	June 11, 2011
Is there an algorithm for this optimization problem? Factual Questions	14	2052	March 31, 2013

Linear least squares optimization

Related topics