Curvefitting by sums of various powers of x, y etc

Napier · April 25, 2007, 2:06pm

[FWIW here is my source material, now that I cleaned it up:]

CALCULATION OF A LORENTZIAN CURVE FROM SUMS
From Kolb page 82

Premise: Y(X) is an unknown Lorentzian peak function that fits a set of (X,Y) observations with coefficient of determination Z:

Y = 1/((A*(X+B)^2)+C)

Problem: Calculate A, B, C and Z given the set of (X,Y) values?

First, calculate the following sums. For example, D is the sum of all the X values. Note that L is the sum of the number 1 for all observations and therefore equals the number of observations. After calculating these sums we may dispose of all the original (X,Y) pairs as they are no longer needed. In fact if the (X,Y) pairs are given to us over time, we need never remember them all, we need only to accumulate the sums over time.

D = Sum X
E = Sum X^2
F = Sum 1/Y
G = Sum 1/Y^2
H = Sum X/Y
I = Sum X^2/Y
J = Sum X^3
K = Sum X^4
L = Sum 1

Second, to simplify things, calculate the following intermediate results.

M = (EL)-(DD)
N = (LI)-(EF)
P = (LJ)-(DE)
Q = (LH)-(DF)
R = (LK)-(EE)
S = (MN)-(PQ)
T = (MR)-(PP)
U = S/T
V = (Q-(PU))/M
W = (F-(VD)-(U*E))/L

Finally, calculate A, B, C and Z accordingly:

A = U
B = V/(2U)
C = W-((V^2/4)U)
Z = ((WF)+(VH)+(U*I)-(F^2/L))/(G-(F^2/L))

illoe · April 25, 2007, 3:14pm

Thank you Napier, that’s really cool! I don’t know what it’s called and I haven’t seen it before, so I’m of absolutely no help to you (except I can reassure you that Lorentz-with-a-t is the profile’s namesake) but you can bet I’ll be using that one!

Chronos · April 25, 2007, 4:32pm

Depending on what exactly the sources of your broadening are, a Lorentzian may even be more accurate than a Gaussian. If you have a high enough SNR, you could try fitting a convolution of an arbitrary Gaussian with an arbitrary Lorentzian, which will give you more information about what specific broadening mechanisms you have (some things cause Lorentzian broadening, and some cause Gaussian).

carterba · April 25, 2007, 4:58pm

It has no name as far as I can tell; it’s just a very clever linear-time algorithm for fitting the curve. If it’s been published anywhere else, I would think it would be in a stats or CS journal, so you could try JSTOR or the ACM Digital Library (I tried both of these but couldn’t find anything). Does the author cite anything in reference to it?

In general I suppose it’s an example of dynamic programming, though getting from the definition of dynamic programming to that particular algorithm is not trivial.

BrandonR · April 25, 2007, 7:13pm

Hmm yeah I’m an ME undergrad student and I was just wondering if I should have any clue as to what you all are talking about…? Or is this masters/PhD level stuff you all are talking about here? Or perhaps I just haven’t taken any of the real advanced classes yet… I hope I haven’t suddenly missed months of some class I was supposed to take.

Napier · April 25, 2007, 8:02pm

Well, folks, I’m not sure what level stuff we are talking about after all!

On a calculator of decades ago, if you wanted to calculate a standard deviation, the method that requires the deviation from the mean from each observation isn’t useable, because you have to have calculated a mean from all the observations before you could start over to calculate the standard deviation. That is, you have to have access to all the observations at the same time. A calculator that can remember, say, 50 values can’t do that with 1000 observations. So they have the Sigma registers that hold sums from which to calculate what you want, and they eat the observations so that nothing is left but their cumulative effect on those registers.

I figured Kolb’s book was written in that vein, to help owners of HP 41C calculators do fancier statistics. Since I was a 41C enthusiast, into synthetic programming and whatnot, I had to have it. I also figured that what he was doing was mainly adapting a common technique to the 41C.

Now I think maybe he was very original, or at least was adapting a technique that isn’t very common.

Web searching his book turns up numerous papers that reference it, but nothing about where he got his method from. In the book itself he talks about how regressions work, but not in much detail.

So it’s news to me that this isn’t the way everybody else would already have used to solve these problems. And I’m glad I happened to buy the book 24 years ago!

Omphaloskeptic · April 25, 2007, 11:06pm

I don’t know how well-known this technique is in general; I also was introduced to it through old scientific calculators (a TI in my case, I think, though I later got that old RPN religion).

This method of minimizing the squared error by calculating particular intermediate statistics was solved, at least for the special case of the linear least-squares fit, by Legendre and Gauss. (I don’t know who first applied this method as a computer algorithm running in limited memory, however.) Here’s a translation (PDF) of Legendre’s method, which notes that the equations to be solved have coefficients which are sums of various functions of the data points. Legendre notes that the method generalizes; since this preceded modern matrix notation, he doesn’t write an explicit general solution, but the general method is clear. This method is precisely what is used to derive the normal equations for the general least-squares solution.

I can’t tell by reading your posts whether you understand how the calculations you posted above for the Cauchy-Lorentz curve (post#21) are derived, but they follow the same method:

The curve to be fit has the form
y=1/(a(x + b)[sup]2[/sup] + c);
this is rewritten as
(1/y) = ax[sup]2[/sup] + (2ab)x + (ab[sup]2[/sup]+c) = d + ex + fx[sup]2[/sup].
Now a least-squares fit is performed to find the regression coefficients (d,e,f). The normal equations give
[d]
[e] = (A[sup]T[/sup]A)[sup]-1[/sup]A[sup]T[/sup]b
[f]
where
A = ( 1 , x[sub]i[/sub] , x[sub]i[/sub][sup]2[/sup] )
b = ( 1/y[sub]i[/sub] )
contain the observations in columns (that is, A has 1s in the first column, the data values x[sub]i[/sub] in the second column, and the values x[sub]i[/sub][sup]2[/sup] in the third column). The crucial point is that the quantities A[sup]T[/sup]A and A[sup]T[/sup]b can be written using just a few intermediate statistics, after which the raw data is no longer needed:
[ S1 Sx Sx[sup]2[/sup] ]
[ Sx Sx[sup]2[/sup] Sx[sup]3[/sup]] = A[sup]T[/sup]A
[Sx[sup]2[/sup] Sx[sup]3[/sup] Sx[sup]4[/sup]]
and
[ S(1/y) ]
[ S(x/y) ] = A[sup]T[/sup]b
[ S(x[sup]2[/sup]/y)]
(here S(f(x,y)) means the sum of all values f(x[sub]i[/sub],y[sub]i[/sub]) over the data; apologies for the ugly formatting). I haven’t checked that inverting A[sup]T[/sup]A gives the results you quote, but this is just algebra. Note that only eight values are required: S1, Sx, Sx[sup]2[/sup], Sx[sup]3[/sup], Sx[sup]4[/sup], S(1/y), S(x/y), and S(x[sup]2[/sup]/y); these are your values L,D,E,J,K,F,H,I. (G is only needed to compute the residual.)

This works as long as you can write the equation to be least-squares minimized as a linear function of the regression coefficients (here (d,e,f)); in particular, it works for any polynomial, though as seen from the Lorentz example above, it’s not restricted to polynomials.

However, we made two questionable assumptions when we inverted the equation to get a polynomial in 1/y. The first is that an uncertainty of dy in y translates to an uncertainty of about dy/y[sup]2[/sup] in 1/y, so if all of our uncertainties in y are about the same size but the measured values of y differ by quite a bit, the resulting uncertainties in 1/y are not going to be the same size. This is easy to fix in the usual least-squares way, by scaling the equations so that all of the variances are the same. The second problem is that if the errors in y are unbiased and Gaussian, say, the errors in 1/y are biased and no longer Gaussian. It is not strictly correct to minimize the squared error in these modified equations. Fixing this requires a nonlinear least-squares approach, which is typically ugly.

Napier · April 26, 2007, 12:27pm

>I can’t tell by reading your posts whether you understand how the calculations…

Wow, Omph, no I didn’t understand how, and your post is fascinating. Kolb’s book introduction sounded a little like this but did not mention nearly enough clues to get me started. What you explained is clear enough that I can get a sort of feel for it even though I’m terrible at math! Thanks!

Miks · June 10, 2011, 8:49pm

Strictly speaking, fitting a polynomial order n to n+1 points is not a fit in the normal sense, but a polynomial which passes exactly through each point. It will not smooth or remove noise at all! :eek: I would be very careful in interpreting such a high-order polynomial, as the extrema would most likely have little or no physical meaning.

An alternative way to handle such smoothing problems, which generally gives a more physical result is to slide a fit of a low order polynomial (e.g. 2) to a sequential group of more than 3 data points over all the data array. Each polynomial is used to obtain a smoothed data value by calculating its value at the central point of the group. You can get more insight from available descriptions of the Savitzky-Golay method, for instance.

GameHat · June 11, 2011, 2:15am

Polynomial curve-fitting is garbage unless there’s an actual underlying physical or informational law that corresponds to the order that you’re fitting with.

Meaning - if you’re plotting the position of something accelerating under gravitational force, a second-order polynomial fit is OK. A fourth-order fit is garbage. You’re lying.

I’m reminded of a cranky old engineering professor I had back in school. He used to excoriate anyone who used Excel to make crazy-ass fits with 6th-order polynomials. He’d call them out in front of the class and bellow, “WITH ENOUGH POLYNOMIAL ORDERS YOU CAN FIT A CHARGING RHINO!”

…meaning, yeah, the curve may look nice, but you aren’t making or describing anything useful.

If the data is really that crazy, you should just draw the curve by hand. It’s a bit scientifically dishonest to use a 17th-order polynomial (or whatever) to fit a a data set unless you actually have a theory that predicts a 17th-order relationship.

billfish678 · June 11, 2011, 2:27am

Yes and no.

Calmeacham’s example makes sense. He had data points that defined a real surface (at least I think thats what he had). It really did have hills and valleys. And, baring any further information its reasonable to assume the surface was approximately continous and smooth. The 17th order fit allowed him to mathmatically define the whole surface based on those points.

But his example is probably the exception that proves the rule.

GameHat · June 11, 2011, 2:56am

But would that crazy 17th-order fit predict anything about a different bit similar surface? Might a different surface of a similar type be fit best with a 13th-order fit? Maybe it needs a 20th-order fit?

Unless there’s something about the surfaces being worked with that suggests a 17th-order relationship, and that suggestion is tested, all CalMeacham is doing is making a pretty curve. Which is fine for a PowerPoint, I guess, but it’s not good science.

(Not trying to call you out, CalMeacham, just playing Devil’s Advocate to billfish678 :D)

billfish678 · June 11, 2011, 3:21am

The 17 hills/valleys suggest the 17th order relationship. It is what it is. Remember this is a surface, not some variable relationship like temp vs pressure or something.

Let me give an example why a pretty curve would be important here.

Lets say he wants to model the diffraction/reflection of light from this surface. The way to do that would involve breaking that surface into lots of little pieces and doing some “fancy” optical calculations. There are two ways to do this. One is a linear interpolation between the points. The other is using the 17th order curve fit. Now, think about this for a minute. What do you think is more likely, that the surface actually IS a series of upright and inverted pyramids defined by peaks, valleys and straight lines or that its actually a semi smooth curve with the peaks and valleys just being where the height measurements were taken?

And I should note that if you modeled diffraction using the linear method you would probably have added odd “noise” effects due to its nature.

So, in this example, it probably actually not just wishful thinking, but the right thing to do. Of course to be thorough, you’d want to do the modeling both ways and compare.

And again, I agree with your point that most of the time, using a higher order than you have reason to expect from the physics is baloney.

In this case we are modeling THIS surface. We arent trying to come up with a model for all surfaces.

GameHat · June 11, 2011, 3:48am

billfish678:

The 17 hills/valleys suggest the 17th order relationship. It is what it is. Remember this is a surface, not some variable relationship like temp vs pressure or something.

Let me give an example why a pretty curve would be important here.

Lets say he wants to model the diffraction/reflection of light from this surface. The way to do that would involve breaking that surface into lots of little pieces and doing some “fancy” optical calculations. There are two ways to do this. One is a linear interpolation between the points. The other is using the 17th order curve fit. Now, think about this for a minute. What do you think is more likely, that the surface actually IS a series of upright and inverted pyramids defined by peaks, valleys and straight lines or that its actually a semi smooth curve with the peaks and valleys just being where the height measurements were taken?

And I should note that if you modeled diffraction using the linear method you would probably have added odd “noise” effects due to its nature.

So, in this example, it probably actually not just wishful thinking, but the right thing to do. Of course to be thorough, you’d want to do the modeling both ways and compare.

And again, I agree with your point that most of the time, using a higher order than you have reason to expect from the physics is baloney.

In this case we are modeling THIS surface. We arent trying to come up with a model for all surfaces.

Hm, great response, billfish678

I get your point. But I still have a few questions (and maybe neither of us can answer them.)

The first is - is there an actual mathematical relationship or property that can explain whatever fit CalMeacham or whoever comes up with? If not, what is the value of the fit other than connecting the data points?

I get and grant that not all phenomena are explained by the simple relationships we all learn in Physics 101. But without an underlying theory or equation, what is the value of a curve that simply exists to fit a given data set?

Second - we’ve been talking about polynomial fits. You brought up a linear fit. But are there other options? I don’t know a damn thing about surface mathematics, but would an exponential or log fit work better? A power fit?

This is turning out to be a really interesting question, so thanks for some insightful replies, all.

billfish678 · June 11, 2011, 4:01am

Its late and I am about done for the night. But let me throw this out. Lets say you collect a bunch of data. The data points are extremely smooth/not noisy. But the curve that fits them is some rather high order polynomial. You don’t really know why, but again it is what it is. And lets say you collect this data multiple times. And besides some low level noise, the curve is always the same.

Now, you have a computer model where this data is the input. To me, in general it makes sense to use this high order fit as input even if you dont know WHY its a high order function. Thats the value of the curve. It represents reality as best as you can tell from your data.

Through in lots of but this, but thats, make sure of the data, and gotchas cause its my bed time.

ultrafilter · June 11, 2011, 4:48am

What does this even mean?

Chronos · June 11, 2011, 6:06am

Were it up to me, I’d probably use a cubic spline interpolation, there.

billfish678 · June 11, 2011, 12:45pm

Data when plotted that almost forms a nice smooth line/curve by itself rather than looking more like a shotgun blast that needs some stats run on it to get a curve that only roughly reflects the data trend because the data is all over the place.

Yeah, Chronos, at least back in the earlier days of modeling and curve drawing the ole cubic spline was better than linear interpolation and good enough for producing a more “smooth” curve connecting the data points.

ZenBeam · June 11, 2011, 1:15pm

I’ve used polynomials of that high an order before. In my case, I had thousands of points of data as a function of angle, and the recorded angle had noise at a low level. There were only a few digits of accuracy recorded, so the individual numbers weren’t very accurate. Also, I was also not using the very ends of the data, where a polynomial fit is less trustworthy.

Another option would be a fit using sines and cosines over the range of the data, with a couple added functions to allow for discontinuities in value and slope at the ends.

A cubic spline would be appropriate if you had data you believed was correct at each point, and you needed to interpolate between those points, but not in my example above, or in CalMeacham’s example if I’m understanding it correctly.

Napier · June 11, 2011, 2:00pm

I’d like to clarify something on the basis of how I understand the collection in Kolb’s book.

What we get using the methods he documents is not a polynomial in X that approximates a distribution density function f(x).

Instead, we get multiple expressions that each give a single value for the dataset, and these multiple values are estimates of the parameters of (nonpolynomial) distribution functions such as the Cauchy function.

If we apply these methods to a dataset that was artificially generated using the distribution function we are modeling, we would get the exact same value we used in generating the dataset, to as many digits as are free from rounding errors.

I am interested (or was, lo, these four years passing) in fitting datasets from physical measurements, and in fact using this as the first pass before starting an iterative fit, for speed. I was trying to write code in assembly and forth to fit distribution curves on the fly, using what a CPU would consider integer math and a metrologist would consider fixed point math. The methods we are discussing are not iterative and they appear to be very fast ways of estimating distribution curve parameters. Of course, if there is significant noise in real physically measured data, and not just a rounding error around the 20th place, then these methods might produce significantly inaccurate parameter estimates. So, when I move on from the first phase of each analysis, I would start an iterative process, but even that would not be polynomial.

In the end, these methods would be bizarre and wrong for parameter estimation of the wrong distribution. The physics of the situation concerns choosing the right goal. There isn’t a relevant issue concerning choosing the right degree of a polynomial, for the kind of problem I was dealing with.

Of course, other kinds of problems mandate all kinds of interesting consideration of polynomial degree, so I don’t mean to piddle on a fun discussion of that.

Topic		Replies	Views
Polynomial Regression help - Calculation of co-efficients Factual Questions	23	10225	May 31, 2011
Statistics (Functions) Factual Questions	18	1251	October 25, 2000
Correlation coefficient in polynomial regression Factual Questions	12	8761	February 20, 2004
I need help with approximation (math). Factual Questions	12	1063	March 16, 2005
Excel: How to make a curvefit to sin^4(x) in office 2007 - or any freeware to do it? Factual Questions	2	1487	December 10, 2008

Curvefitting by sums of various powers of x, y etc

Related topics