PDA

View Full Version : Curvefitting by sums of various powers of x, y etc

Napier
04-24-2007, 08:32 AM
There is a kind of regression analysis, or more particularly a kind of calculation for regression coefficients, based on creating various summations. For example, you can fit a Cauchy distribution curve through a set of (x,y) points by summing x, x^2, 1/y, 1/y^2, x/y, x^2/y, x^3, x^4, and 1 (the sum of 1 for all points being N). The coefficients of the curve are then functions of the sums. I have a collection of these methods in the book Curve Fitting for Programmable Calculators, by William M. Kolb.

What is the name for this kind of method?

Where can I find more such calculations, and perhaps supporting information?

CalMeacham
04-24-2007, 09:02 AM
It's a polynomial fit. I've used them often enough. One time I used a 17th order polynomial to fit observed data because it gave such a good and useful fit.

It's tantalizing to do all your fits this way, but you should always keep an eye out for the form that would most naturally firt the observed data with the fewest terms. Sometimes genuine physical insight comes that way (Planck's fit to the blackbody curve, for instance).

Also, if the curve isn't pretty close to the existing data, don't use that fit. An ideal least-square type fit is one that does a pretty good job 0of following the actual data, allowing for noise. I had to yell at a colleague who was using a quadratic fit to widely-spaced points that it wasn't following particularly closely. What was the point? the better fit was a cubic equation, since we knew that data was supposed to be asymmetrical, and that was the lowest order asymmetric polynomial (and fit pretty well in that region. I was able to demonstrate later theoretically that a cubic was the lowest-order approximation.)

A nice thing about polynomial fits is that you can easily perform the math using linear algebra to determine the coefficients.

Stranger On A Train
04-24-2007, 10:45 AM
It's a polynomial fit. I've used them often enough. One time I used a 17th order polynomial to fit observed data because it gave such a good and useful fit. :confused: I have to ask: what would you fit with a 17 order polynomial that would be a valid fit, and how many data points did you have?

I've done regression fits up to forth order (for backfitting a model of a differential control system) and I've used p-element structural FE codes that go up to order 9, but I have a hard time envisioning anything like a real world 17th order system. Is this an economic model, perhaps?

Stranger

ultrafilter
04-24-2007, 10:52 AM
:confused: I have to ask: what would you fit with a 17 order polynomial that would be a valid fit, and how many data points did you have?

17th-order models work really well when you have 18 observations.

CalMeacham
04-24-2007, 10:56 AM
I have to ask: what would you fit with a 17 order polynomial that would be a valid fit, and how many data points did you have?

It was an engineering problem -- we were looking at the deviation of a surface from perfectly flat. The measurement method gave us a very noisy trace overlying the generally smooth and slowly-varying bumps and wiggles that were the true surface peofile. I simply used a polynomial fit to look at that underlying function in order to get the true surface deviations from flatness, and I needed an order as high as 17th to get all the bumps in there. A polynomial fit was actually easier and quicker than averaging, and has much less residual noise. If the thing we were measuring had been half as long, Icould've used a nintgh order polynomial.

Stranger On A Train
04-24-2007, 11:12 AM
17th-order models work really well when you have 18 observations.Sure, you can make a 17th order polynomial fit with 18 data points, but would it mean anything and/or be useful? The point of doing a curve fit, after all, is that you end up with a function that is easily differentiable or otherwise capable of being interpreted, and more importantly, functions as an accurate model for interpolation between points or (perhaps) extrapolation beyond the data set. A 17th order polynomial, however, is likely to look like the Andes, bouncing up and down.

I'm not doubting CalMeacham's application of such a high order regression, mind you, since I have no idea what kind of data he was working with and what type of behavior he was modeling, but I just can't imagine anything in my own experience in which such a system would be valid.

Pointless engineering anecdote: I was once working with a freshly minted engineer, teaching him the ins and outs of a then-popular commercial p-element code, and then sent him on his way to analyze a simple bracket that I'd previously worked on and was in production. Now, this code had an automatic meshing system that was, to be generous, a little wonky, and it would produce meshes that I considered error-prone despite the limits you would put on it; high angles, bad edge ratios, poor skewness and jacobian values, et cetera. The vendor who then owned the code assured users that this wasn't a problem because it would just ramp up the p-order of the elements and this would "fix" the problem. :rolleyes: Anyway, Junior E does his analysis and comes back with massive stress risers all over the place which exceed the materal elastic limits by a factor of 4 or 5. He immediately calls up the engineer in charge of the product line to notify him, and a whole hubaloo goes on before I get involved. (I think I was out of the office when he did this, but whatever.)

Anyway, we go back and look at his model, and it turns out that every element in the high stress areas was badly shaped and had been ramped up to order 8 or 9, which means that all of the calculations of the mid-side nodes of the elements were wildly varying, with enormous strain energies; in short, it was all so much garbage. When we went back and ran the thing, restraining the edge orders to no more than order 4, it looked like silk, with nary a dramatic stress riser to be seen, so all of his high stress results were an artifact of a bad curve fit (and one that the software should have caught, but it didn't--I've since become rather skeptical of that particular code). I then instructed him on the use of the manual meshing tools, as well as how to examine the all the outputs instead of just stress results before drawing any conclusions about the validity of the stress values. Oh, and then I had to go to a meeting and explain why finite element analysis is a art of approximation rather than a strict science, which caused the VP of engineering to decry all the money spent on "useless" analysis tools. Good times.

Stranger

Stranger On A Train
04-24-2007, 11:17 AM
It was an engineering problem -- we were looking at the deviation of a surface from perfectly flat. The measurement method gave us a very noisy trace overlying the generally smooth and slowly-varying bumps and wiggles that were the true surface peofile. I simply used a polynomial fit to look at that underlying function in order to get the true surface deviations from flatness, and I needed an order as high as 17th to get all the bumps in there. A polynomial fit was actually easier and quicker than averaging, and has much less residual noise. If the thing we were measuring had been half as long, Icould've used a nintgh order polynomial.So you were using the fit to generate a distribution of the deviations? Hmmm...I'd guess the variations aren't much, so you're not likely to get huge artificial peaks and troughs, but I'm not sure why you wouldn't just use the data to get a Gaussian distribution directly. I'm not questioning your method; I'm just curious as to why you did it this way.

Stranger

Kevbo
04-24-2007, 11:37 AM
...functions as an accurate model for interpolation between points or (perhaps) extrapolation beyond the data set.

<bolding mine>

Fitted polynomials, usually, and certainly more often than not, diverge wildly just beyond the data space used for the fitting. As a result, it is almost never a good idea to extrapolate.

Exceptions I would make:

When the underlying mechanism is known to be of the same order as the fitted polynomial, and the fitting was done only to smooth noisy data.

When the extrapolation is used only to suggest the direction for gathering additional data, so that validity of the extrapolation will be tested experimentally.

Chronos
04-24-2007, 11:55 AM
The problem with a high-order polynomial fit is that it'll generally go crazy just outside of the range of the data points, or sometimes even in between the data points. When you've got 17th order terms, even just a tiny amount of noise can blow up into huge differences in the fitted curve.

Now, you will occasionally see high-order power law fits, but those include only the highest order, and so have very few parameters. So they're still generally pretty stable (if a power law is what's most appropriate for the data).

Napier
04-24-2007, 12:03 PM
Wait, wait, wait.

I'm asking about the calculation method specifically of summing various integer powers of the variables, and then calculating the parameters of a fitted curve as functions of the sums.

Isn't there a name for this approach per se?

Where might I find a list of curves and their parameters as functions of the sums?

CalMeacham
04-24-2007, 12:30 PM
So you were using the fit to generate a distribution of the deviations? Hmmm...I'd guess the variations aren't much, so you're not likely to get huge artificial peaks and troughs, but I'm not sure why you wouldn't just use the data to get a Gaussian distribution directly. I'm not questioning your method; I'm just curious as to why you did it this way.

Stranger

No, you misunderstand -- I've got a surface deviation which consists of a re;latyively small number of gentle rises and troughs. The 17th order polynomial fits those. A Gaussian is a single peak, and wouldn't at all fit the observed surface form.

As a rule, an nth order polynomial can contain up to n-1 extrema (both peaks and valleys). I had a surface with about eight high points, and needed 17th order to make sure I had enough ups and downs to get all of them represented. It worked quite well.

CalMeacham
04-24-2007, 12:36 PM
I'm asking about the calculation method specifically of summing various integer powers of the variables, and then calculating the parameters of a fitted curve as functions of the sums.

Isn't there a name for this approach per se?

Where might I find a list of curves and their parameters as functions of the sums?

i've never heard of a special name for it. My oldest book simply calls it "Least Squares Fit to a Polynomial". That's Phili[p Bevington;'s "Data Reduction and Error Analysis for the Physical Sciences". Press et al in "Numerical Recipes" just call it Least-Square Fitting. You can call it Chi-Square fitting, because you're minimizing the value of the deviation function Chi squared.

As for "A List of Curves and their Parameters", I'm not sure what you mean. There are plenty of books giving the infinite (or finite) sun polynomial forms of various functions. Look at Abramowitz and Stegun, or Gradshteyn and Ryzhik, or the CRC Handbook.

Stranger On A Train
04-24-2007, 12:59 PM
Fitted polynomials, usually, and certainly more often than not, diverge wildly just beyond the data space used for the fitting. As a result, it is almost never a good idea to extrapolate.Well, yeah, that's why I qualified it with a "perhaps" and further statement as to the wildly varying nature of higher order polynomial functions. You can (often) make valid extrapolitions from linear fits, and (sometimes) from parabolics, provided with both that you have some confidence in the application of the fit to phenomena outside the range of existing data. Beyond that, I wouldn't give much credence to any higher order fit unless the phenomena could analytically be described in terms of a high order polynomial, which is pretty rare in nature.

Napier, sorry about the hijack. I don't have an answer for you, and I'm not sure I even understand your original question properly.

Stranger

L. G. Butts, Ph.D.
04-24-2007, 01:06 PM
No, you misunderstand -- I've got a surface deviation which consists of a re;latyively small number of gentle rises and troughs. The 17th order polynomial fits those. A Gaussian is a single peak, and wouldn't at all fit the observed surface form.Sounds like a trigonometric interpolation might have been better for this analysis.

This is one of the things that drives me nuts in my job. My coworkers, many of whom hold PhDs in physics, EE, etc..., insist on using high order polynomial interpolation without understanding how polynomials behave near the end data points or worse, outside the range of data points. This combined with their tendency to use 6th order polynomials to fit data that theoretically should be quadratic, Gaussian, or some other function (typically sinc^2) drives me nuts. It's not that hard to do a nonlinear fit.

Stranger On A Train
04-24-2007, 01:16 PM
No, you misunderstand -- I've got a surface deviation which consists of a re;latyively small number of gentle rises and troughs. The 17th order polynomial fits those. A Gaussian is a single peak, and wouldn't at all fit the observed surface form.

As a rule, an nth order polynomial can contain up to n-1 extrema (both peaks and valleys). I had a surface with about eight high points, and needed 17th order to make sure I had enough ups and downs to get all of them represented. It worked quite well.A normal distribution isn't intended to fit the data onto a physical curve; it gives you a fit to a continuous probability distribution from which you can get information about the overall stochastic behavior of the system (standard deviation, variances, et cetera). I guess I'm assuming that you wanted to collect information for statistical process control or some other generalization of the data, but it sounds like you wanted a representation of this specific surface, so maybe we're talking cross purposes. At any rate, I'll drop it as it's a complete tangent to the question posed by the o.p.

Stranger

Stranger On A Train
04-24-2007, 01:28 PM
This is one of the things that drives me nuts in my job. My coworkers, many of whom hold PhDs in physics, EE, etc..., insist on using high order polynomial interpolation without understanding how polynomials behave near the end data points or worse, outside the range of data points. This combined with their tendency to use 6th order polynomials to fit data that theoretically should be quadratic, Gaussian, or some other function (typically sinc^2) drives me nuts. It's not that hard to do a nonlinear fit."But if it fits my data set, it must be more accurate, right?"

Actually, what I love is when people start talking about confidence intervals with single data sets of population 3. "You realize there's no way to way to calculate a confidence interval from this, right?" "Well, see, you just multiply the the estimated probability by some number I just pulled out of my ass, and then wave the standard deviation around until you get dizzy, and it all works out."

Stranger

Omphaloskeptic
04-24-2007, 04:18 PM
There is a kind of regression analysis, or more particularly a kind of calculation for regression coefficients, based on creating various summations. For example, you can fit a Cauchy distribution curve through a set of (x,y) points by summing x, x^2, 1/y, 1/y^2, x/y, x^2/y, x^3, x^4, and 1 (the sum of 1 for all points being N). The coefficients of the curve are then functions of the sums. I have a collection of these methods in the book Curve Fitting for Programmable Calculators, by William M. Kolb.

What is the name for this kind of method?

Where can I find more such calculations, and perhaps supporting information?These are (as others have said) probably least-squares/Chi-squared fits to a polynomial fit equation--for a Cauchy distribution, it can be taken to be a polynomial (in {x,1/y}) of the form a+bx+cx2=1/y. In particular, these sums are coefficients of the normal equations (http://en.wikipedia.org/wiki/Linear_least_squares) ATAx=ATb (derived by taking partial derivatives to minimize the squared error) for this polynomial fit.

To get the equations for the regression coefficients, you just compute the symbolic matrix inversion x=(ATA)-1ATb in your normal-equation coefficients. (Note that symbolic matrix inversion rapidly gets ugly; a numerical matrix inversion or matrix solution is much faster when you have lots of coefficients, so there's little point to deriving the general form.)

illoe
04-25-2007, 01:11 AM
Do you mean something like this (http://links.jstor.org/sici?sici=0025-5718(197701)31%3A137%3C214%3AAFHCUS%3E2.0.CO%3B2-W); are you seeking a reference giving small-integer-coefficient fits for common functions? The rationale given in the link is minimization of keystrokes on a handheld device, and I could see how it would be handy for exponential integrals and the like.

Or do you have a function/data set that you want to explore with some technique more elegant than brute-force double precision least-squares or spline fitting?

Or is there really a way to get information about a function/data set (like skew or higher moments) by looking at sums over small integer powers? I'd find that fascinating and would love to hear more about it (I played around with your example for a while, but didn't see anything arresting, though I may be missing something by not understanding what you mean by "the sum of 1 for all points being N.")

Napier
04-25-2007, 07:14 AM
OK, let me see if I can explain this. I am NOT trying to explain y(x) by creating a polynomial in x whose value is close to observations of y.

In my present example, I am more specifically trying to fit a Lorentzian (sp?) peak function to a spectrum measured by an instrument. The spectrum as it is reported has a peak in it, that looks like a bell curve. I would like to fit a Lorentzian to that, because it is much faster to calculate than a Gaussian and would produce equivalently valuable results. So I find that if I accumulate the sums I described, including a sum that is equal to the number of observations (that is the sum of 1 and is usually written as N rather than SIGMA 1), I can estimate the amplitude of the Lorentzian as one function of the sums, the width parameter as another function, the centroid (where in x is the peak?) as a third function, and the coefficient of determination as a fourth function. I don't need any of the original data to do these estimations. The functions I refer to are complicated but reduce to rationals if I have several layers of substitution. I didn't invent this method, I found it in the book cited.

Napier
04-25-2007, 07:19 AM
Maybe I should have explained one thing more carefully:

The peak in the raw data is not the same thing as the dispersion of multiple attempts to measure a single parameter. Don't think of a histogram of imperfect measurements of one absolutely accurate but unknowable number. If I had a magical way of making the peak narrower, it would just spoil the whole thing, not give me some more accurate result.

Rather, the physical process that generates the spectrum involves various phenomena that contribute to peak broadening. These phenomena are part of what is interesting about the physical system. The peak is not wide because something is wrong and imperfect - it's wide because of the very thing I want to study in the end.

Napier
04-25-2007, 09:06 AM
[FWIW here is my source material, now that I cleaned it up:]

CALCULATION OF A LORENTZIAN CURVE FROM SUMS
From Kolb page 82

Premise: Y(X) is an unknown Lorentzian peak function that fits a set of (X,Y) observations with coefficient of determination Z:

Y = 1/((A*(X+B)^2)+C)

Problem: Calculate A, B, C and Z given the set of (X,Y) values?

First, calculate the following sums. For example, D is the sum of all the X values. Note that L is the sum of the number 1 for all observations and therefore equals the number of observations. After calculating these sums we may dispose of all the original (X,Y) pairs as they are no longer needed. In fact if the (X,Y) pairs are given to us over time, we need never remember them all, we need only to accumulate the sums over time.

D = Sum X
E = Sum X^2
F = Sum 1/Y
G = Sum 1/Y^2
H = Sum X/Y
I = Sum X^2/Y
J = Sum X^3
K = Sum X^4
L = Sum 1

Second, to simplify things, calculate the following intermediate results.

M = (E*L)-(D*D)
N = (L*I)-(E*F)
P = (L*J)-(D*E)
Q = (L*H)-(D*F)
R = (L*K)-(E*E)
S = (M*N)-(P*Q)
T = (M*R)-(P*P)
U = S/T
V = (Q-(P*U))/M
W = (F-(V*D)-(U*E))/L

Finally, calculate A, B, C and Z accordingly:

A = U
B = V/(2*U)
C = W-((V^2/4)*U)
Z = ((W*F)+(V*H)+(U*I)-(F^2/L))/(G-(F^2/L))

illoe
04-25-2007, 10:14 AM
Thank you Napier, that's really cool! I don't know what it's called and I haven't seen it before, so I'm of absolutely no help to you (except I can reassure you that Lorentz-with-a-t is the profile's namesake) but you can bet I'll be using that one!

Chronos
04-25-2007, 11:32 AM
I would like to fit a Lorentzian to that, because it is much faster to calculate than a Gaussian and would produce equivalently valuable results.Depending on what exactly the sources of your broadening are, a Lorentzian may even be more accurate than a Gaussian. If you have a high enough SNR, you could try fitting a convolution of an arbitrary Gaussian with an arbitrary Lorentzian, which will give you more information about what specific broadening mechanisms you have (some things cause Lorentzian broadening, and some cause Gaussian).

carterba
04-25-2007, 11:58 AM
It has no name as far as I can tell; it's just a very clever linear-time algorithm for fitting the curve. If it's been published anywhere else, I would think it would be in a stats or CS journal, so you could try JSTOR or the ACM Digital Library (I tried both of these but couldn't find anything). Does the author cite anything in reference to it?

In general I suppose it's an example of dynamic programming (http://en.wikipedia.org/wiki/Dynamic_programming), though getting from the definition of dynamic programming to that particular algorithm is not trivial.

BrandonR
04-25-2007, 02:13 PM
Hmm yeah I'm an ME undergrad student and I was just wondering if I should have any clue as to what you all are talking about...? Or is this masters/PhD level stuff you all are talking about here? Or perhaps I just haven't taken any of the real advanced classes yet... I hope I haven't suddenly missed months of some class I was supposed to take.

Napier
04-25-2007, 03:02 PM
Well, folks, I'm not sure what level stuff we are talking about after all!

On a calculator of decades ago, if you wanted to calculate a standard deviation, the method that requires the deviation from the mean from each observation isn't useable, because you have to have calculated a mean from all the observations before you could start over to calculate the standard deviation. That is, you have to have access to all the observations at the same time. A calculator that can remember, say, 50 values can't do that with 1000 observations. So they have the Sigma registers that hold sums from which to calculate what you want, and they eat the observations so that nothing is left but their cumulative effect on those registers.

I figured Kolb's book was written in that vein, to help owners of HP 41C calculators do fancier statistics. Since I was a 41C enthusiast, into synthetic programming and whatnot, I had to have it. I also figured that what he was doing was mainly adapting a common technique to the 41C.

Now I think maybe he was very original, or at least was adapting a technique that isn't very common.

Web searching his book turns up numerous papers that reference it, but nothing about where he got his method from. In the book itself he talks about how regressions work, but not in much detail.

So it's news to me that this isn't the way everybody else would already have used to solve these problems. And I'm glad I happened to buy the book 24 years ago!

Omphaloskeptic
04-25-2007, 06:06 PM
So they have the Sigma registers that hold sums from which to calculate what you want, and they eat the observations so that nothing is left but their cumulative effect on those registers.

I also figured that what he was doing was mainly adapting a common technique to the 41C. [...] Web searching his book turns up numerous papers that reference it, but nothing about where he got his method from. In the book itself he talks about how regressions work, but not in much detail.

So it's news to me that this isn't the way everybody else would already have used to solve these problems. And I'm glad I happened to buy the book 24 years ago!I don't know how well-known this technique is in general; I also was introduced to it through old scientific calculators (a TI in my case, I think, though I later got that old RPN religion).

This method of minimizing the squared error by calculating particular intermediate statistics was solved, at least for the special case of the linear least-squares fit, by Legendre and Gauss. (I don't know who first applied this method as a computer algorithm running in limited memory, however.) Here's a translation (http://www.stat.ucla.edu/history/legendre.pdf) (PDF) of Legendre's method, which notes that the equations to be solved have coefficients which are sums of various functions of the data points. Legendre notes that the method generalizes; since this preceded modern matrix notation, he doesn't write an explicit general solution, but the general method is clear. This method is precisely what is used to derive the normal equations for the general least-squares solution.

I can't tell by reading your posts whether you understand how the calculations you posted above for the Cauchy-Lorentz curve (post#21) are derived, but they follow the same method:

The curve to be fit has the form
y=1/(a(x + b)2 + c);
this is rewritten as
(1/y) = ax2 + (2ab)x + (ab2+c) = d + ex + fx2.
Now a least-squares fit is performed to find the regression coefficients (d,e,f). The normal equations give
[d]
[e] = (ATA)-1ATb
[f]
where
A = ( 1 , xi , xi2 )
b = ( 1/yi )
contain the observations in columns (that is, A has 1s in the first column, the data values xi in the second column, and the values xi2 in the third column). The crucial point is that the quantities ATA and ATb can be written using just a few intermediate statistics, after which the raw data is no longer needed:
[ S1 Sx Sx2 ]
[ Sx Sx2 Sx3] = ATA
[Sx2 Sx3 Sx4]
and
[ S(1/y) ]
[ S(x/y) ] = ATb
[ S(x2/y)]
(here S(f(x,y)) means the sum of all values f(xi,yi) over the data; apologies for the ugly formatting). I haven't checked that inverting ATA gives the results you quote, but this is just algebra. Note that only eight values are required: S1, Sx, Sx2, Sx3, Sx4, S(1/y), S(x/y), and S(x2/y); these are your values L,D,E,J,K,F,H,I. (G is only needed to compute the residual.)

This works as long as you can write the equation to be least-squares minimized as a linear function of the regression coefficients (here (d,e,f)); in particular, it works for any polynomial, though as seen from the Lorentz example above, it's not restricted to polynomials.

However, we made two questionable assumptions when we inverted the equation to get a polynomial in 1/y. The first is that an uncertainty of dy in y translates to an uncertainty of about dy/y2 in 1/y, so if all of our uncertainties in y are about the same size but the measured values of y differ by quite a bit, the resulting uncertainties in 1/y are not going to be the same size. This is easy to fix in the usual least-squares way, by scaling the equations so that all of the variances are the same. The second problem is that if the errors in y are unbiased and Gaussian, say, the errors in 1/y are biased and no longer Gaussian. It is not strictly correct to minimize the squared error in these modified equations. Fixing this requires a nonlinear least-squares approach, which is typically ugly.

Napier
04-26-2007, 07:27 AM
>I can't tell by reading your posts whether you understand how the calculations...

Wow, Omph, no I didn't understand how, and your post is fascinating. Kolb's book introduction sounded a little like this but did not mention nearly enough clues to get me started. What you explained is clear enough that I can get a sort of feel for it even though I'm terrible at math! Thanks!

Miks
06-10-2011, 03:49 PM
No, you misunderstand -- I've got a surface deviation which consists of a re;latyively small number of gentle rises and troughs. The 17th order polynomial fits those. A Gaussian is a single peak, and wouldn't at all fit the observed surface form.

As a rule, an nth order polynomial can contain up to n-1 extrema (both peaks and valleys). I had a surface with about eight high points, and needed 17th order to make sure I had enough ups and downs to get all of them represented. It worked quite well.

Strictly speaking, fitting a polynomial order n to n+1 points is not a fit in the normal sense, but a polynomial which passes exactly through each point. It will not smooth or remove noise at all! :eek: I would be very careful in interpreting such a high-order polynomial, as the extrema would most likely have little or no physical meaning.

An alternative way to handle such smoothing problems, which generally gives a more physical result is to slide a fit of a low order polynomial (e.g. 2) to a sequential group of more than 3 data points over all the data array. Each polynomial is used to obtain a smoothed data value by calculating its value at the central point of the group. You can get more insight from available descriptions of the Savitzky-Golay method, for instance.

GameHat
06-10-2011, 09:15 PM
It's a polynomial fit. I've used them often enough. One time I used a 17th order polynomial to fit observed data because it gave such a good and useful fit.

Polynomial curve-fitting is garbage unless there's an actual underlying physical or informational law that corresponds to the order that you're fitting with.

Meaning - if you're plotting the position of something accelerating under gravitational force, a second-order polynomial fit is OK. A fourth-order fit is garbage. You're lying.

I'm reminded of a cranky old engineering professor I had back in school. He used to excoriate anyone who used Excel to make crazy-ass fits with 6th-order polynomials. He'd call them out in front of the class and bellow, "WITH ENOUGH POLYNOMIAL ORDERS YOU CAN FIT A CHARGING RHINO!"

...meaning, yeah, the curve may look nice, but you aren't making or describing anything useful.

If the data is really that crazy, you should just draw the curve by hand. It's a bit scientifically dishonest to use a 17th-order polynomial (or whatever) to fit a a data set unless you actually have a theory that predicts a 17th-order relationship.

billfish678
06-10-2011, 09:27 PM
If the data is really that crazy, you should just draw the curve by hand. It's a bit scientifically dishonest to use a 17th-order polynomial (or whatever) to fit a a data set unless you actually have a theory that predicts a 17th-order relationship.

Yes and no.

Calmeacham's example makes sense. He had data points that defined a real surface (at least I think thats what he had). It really did have hills and valleys. And, baring any further information its reasonable to assume the surface was approximately continous and smooth. The 17th order fit allowed him to mathmatically define the whole surface based on those points.

But his example is probably the exception that proves the rule.

GameHat
06-10-2011, 09:56 PM
Yes and no.

Calmeacham's example makes sense. He had data points that defined a real surface (at least I think thats what he had). It really did have hills and valleys. And, baring any further information its reasonable to assume the surface was approximately continous and smooth. The 17th order fit allowed him to mathmatically define the whole surface based on those points.

But his example is probably the exception that proves the rule.

But would that crazy 17th-order fit predict anything about a different bit similar surface? Might a different surface of a similar type be fit best with a 13th-order fit? Maybe it needs a 20th-order fit?

Unless there's something about the surfaces being worked with that suggests a 17th-order relationship, and that suggestion is tested, all CalMeacham is doing is making a pretty curve. Which is fine for a PowerPoint, I guess, but it's not good science.

(Not trying to call you out, CalMeacham, just playing Devil's Advocate to billfish678 :D)

billfish678
06-10-2011, 10:21 PM
Unless there's something about the surfaces being worked with that suggests a 17th-order relationship, and that suggestion is tested, all CalMeacham is doing is making a pretty curve. Which is fine for a PowerPoint, I guess, but it's not good science.

The 17 hills/valleys suggest the 17th order relationship. It is what it is. Remember this is a surface, not some variable relationship like temp vs pressure or something.

Let me give an example why a pretty curve would be important here.

Lets say he wants to model the diffraction/reflection of light from this surface. The way to do that would involve breaking that surface into lots of little pieces and doing some "fancy" optical calculations. There are two ways to do this. One is a linear interpolation between the points. The other is using the 17th order curve fit. Now, think about this for a minute. What do you think is more likely, that the surface actually IS a series of upright and inverted pyramids defined by peaks, valleys and straight lines or that its actually a semi smooth curve with the peaks and valleys just being where the height measurements were taken?

And I should note that if you modeled diffraction using the linear method you would probably have added odd "noise" effects due to its nature.

So, in this example, it probably actually not just wishful thinking, but the right thing to do. Of course to be thorough, you'd want to do the modeling both ways and compare.

And again, I agree with your point that most of the time, using a higher order than you have reason to expect from the physics is baloney.

In this case we are modeling THIS surface. We arent trying to come up with a model for all surfaces.

GameHat
06-10-2011, 10:48 PM
The 17 hills/valleys suggest the 17th order relationship. It is what it is. Remember this is a surface, not some variable relationship like temp vs pressure or something.

Let me give an example why a pretty curve would be important here.

Lets say he wants to model the diffraction/reflection of light from this surface. The way to do that would involve breaking that surface into lots of little pieces and doing some "fancy" optical calculations. There are two ways to do this. One is a linear interpolation between the points. The other is using the 17th order curve fit. Now, think about this for a minute. What do you think is more likely, that the surface actually IS a series of upright and inverted pyramids defined by peaks, valleys and straight lines or that its actually a semi smooth curve with the peaks and valleys just being where the height measurements were taken?

And I should note that if you modeled diffraction using the linear method you would probably have added odd "noise" effects due to its nature.

So, in this example, it probably actually not just wishful thinking, but the right thing to do. Of course to be thorough, you'd want to do the modeling both ways and compare.

And again, I agree with your point that most of the time, using a higher order than you have reason to expect from the physics is baloney.

In this case we are modeling THIS surface. We arent trying to come up with a model for all surfaces.

Hm, great response, billfish678

I get your point. But I still have a few questions (and maybe neither of us can answer them.)

The first is - is there an actual mathematical relationship or property that can explain whatever fit CalMeacham or whoever comes up with? If not, what is the value of the fit other than connecting the data points?

I get and grant that not all phenomena are explained by the simple relationships we all learn in Physics 101. But without an underlying theory or equation, what is the value of a curve that simply exists to fit a given data set?

Second - we've been talking about polynomial fits. You brought up a linear fit. But are there other options? I don't know a damn thing about surface mathematics, but would an exponential or log fit work better? A power fit?

This is turning out to be a really interesting question, so thanks for some insightful replies, all.

billfish678
06-10-2011, 11:01 PM
I get and grant that not all phenomena are explained by the simple relationships we all learn in Physics 101. But without an underlying theory or equation, what is the value of a curve that simply exists to fit a given data set?

.

Its late and I am about done for the night. But let me throw this out. Lets say you collect a bunch of data. The data points are extremely smooth/not noisy. But the curve that fits them is some rather high order polynomial. You don't really know why, but again it is what it is. And lets say you collect this data multiple times. And besides some low level noise, the curve is always the same.

Now, you have a computer model where this data is the input. To me, in general it makes sense to use this high order fit as input even if you dont know WHY its a high order function. Thats the value of the curve. It represents reality as best as you can tell from your data.

Through in lots of but this, but thats, make sure of the data, and gotchas cause its my bed time.

ultrafilter
06-10-2011, 11:48 PM
The data points are extremely smooth/not noisy.

What does this even mean?

Chronos
06-11-2011, 01:06 AM
Second - we've been talking about polynomial fits. You brought up a linear fit. But are there other options? I don't know a damn thing about surface mathematics, but would an exponential or log fit work better? A power fit? Were it up to me, I'd probably use a cubic spline interpolation, there.

billfish678
06-11-2011, 07:45 AM
What does this even mean?

Data when plotted that almost forms a nice smooth line/curve by itself rather than looking more like a shotgun blast that needs some stats run on it to get a curve that only roughly reflects the data trend because the data is all over the place.

Yeah, Chronos, at least back in the earlier days of modeling and curve drawing the ole cubic spline was better than linear interpolation and good enough for producing a more "smooth" curve connecting the data points.

ZenBeam
06-11-2011, 08:15 AM
I've used polynomials of that high an order before. In my case, I had thousands of points of data as a function of angle, and the recorded angle had noise at a low level. There were only a few digits of accuracy recorded, so the individual numbers weren't very accurate. Also, I was also not using the very ends of the data, where a polynomial fit is less trustworthy.

Another option would be a fit using sines and cosines over the range of the data, with a couple added functions to allow for discontinuities in value and slope at the ends.

A cubic spline would be appropriate if you had data you believed was correct at each point, and you needed to interpolate between those points, but not in my example above, or in CalMeacham's example if I'm understanding it correctly.

Napier
06-11-2011, 09:00 AM
I'd like to clarify something on the basis of how I understand the collection in Kolb's book.

What we get using the methods he documents is not a polynomial in X that approximates a distribution density function f(x).

Instead, we get multiple expressions that each give a single value for the dataset, and these multiple values are estimates of the parameters of (nonpolynomial) distribution functions such as the Cauchy function.

If we apply these methods to a dataset that was artificially generated using the distribution function we are modeling, we would get the exact same value we used in generating the dataset, to as many digits as are free from rounding errors.

I am interested (or was, lo, these four years passing) in fitting datasets from physical measurements, and in fact using this as the first pass before starting an iterative fit, for speed. I was trying to write code in assembly and forth to fit distribution curves on the fly, using what a CPU would consider integer math and a metrologist would consider fixed point math. The methods we are discussing are not iterative and they appear to be very fast ways of estimating distribution curve parameters. Of course, if there is significant noise in real physically measured data, and not just a rounding error around the 20th place, then these methods might produce significantly inaccurate parameter estimates. So, when I move on from the first phase of each analysis, I would start an iterative process, but even that would not be polynomial.

In the end, these methods would be bizarre and wrong for parameter estimation of the wrong distribution. The physics of the situation concerns choosing the right goal. There isn't a relevant issue concerning choosing the right degree of a polynomial, for the kind of problem I was dealing with.

Of course, other kinds of problems mandate all kinds of interesting consideration of polynomial degree, so I don't mean to piddle on a fun discussion of that.