Curvefitting by sums of various powers of x, y etc

There is a kind of regression analysis, or more particularly a kind of calculation for regression coefficients, based on creating various summations. For example, you can fit a Cauchy distribution curve through a set of (x,y) points by summing x, x^2, 1/y, 1/y^2, x/y, x^2/y, x^3, x^4, and 1 (the sum of 1 for all points being N). The coefficients of the curve are then functions of the sums. I have a collection of these methods in the book Curve Fitting for Programmable Calculators, by William M. Kolb.

What is the name for this kind of method?

Where can I find more such calculations, and perhaps supporting information?

It’s a polynomial fit. I’ve used them often enough. One time I used a 17th order polynomial to fit observed data because it gave such a good and useful fit.
It’s tantalizing to do all your fits this way, but you should always keep an eye out for the form that would most naturally firt the observed data with the fewest terms. Sometimes genuine physical insight comes that way (Planck’s fit to the blackbody curve, for instance).

Also, if the curve isn’t pretty close to the existing data, don’t use that fit. An ideal least-square type fit is one that does a pretty good job 0of following the actual data, allowing for noise. I had to yell at a colleague who was using a quadratic fit to widely-spaced points that it wasn’t following particularly closely. What was the point? the better fit was a cubic equation, since we knew that data was supposed to be asymmetrical, and that was the lowest order asymmetric polynomial (and fit pretty well in that region. I was able to demonstrate later theoretically that a cubic was the lowest-order approximation.)
A nice thing about polynomial fits is that you can easily perform the math using linear algebra to determine the coefficients.

:confused: I have to ask: what would you fit with a 17 order polynomial that would be a valid fit, and how many data points did you have?

I’ve done regression fits up to forth order (for backfitting a model of a differential control system) and I’ve used p-element structural FE codes that go up to order 9, but I have a hard time envisioning anything like a real world 17th order system. Is this an economic model, perhaps?

Stranger

17th-order models work really well when you have 18 observations.

It was an engineering problem – we were looking at the deviation of a surface from perfectly flat. The measurement method gave us a very noisy trace overlying the generally smooth and slowly-varying bumps and wiggles that were the true surface peofile. I simply used a polynomial fit to look at that underlying function in order to get the true surface deviations from flatness, and I needed an order as high as 17th to get all the bumps in there. A polynomial fit was actually easier and quicker than averaging, and has much less residual noise. If the thing we were measuring had been half as long, Icould’ve used a nintgh order polynomial.

Sure, you can make a 17th order polynomial fit with 18 data points, but would it mean anything and/or be useful? The point of doing a curve fit, after all, is that you end up with a function that is easily differentiable or otherwise capable of being interpreted, and more importantly, functions as an accurate model for interpolation between points or (perhaps) extrapolation beyond the data set. A 17th order polynomial, however, is likely to look like the Andes, bouncing up and down.

I’m not doubting CalMeacham’s application of such a high order regression, mind you, since I have no idea what kind of data he was working with and what type of behavior he was modeling, but I just can’t imagine anything in my own experience in which such a system would be valid.

Pointless engineering anecdote: I was once working with a freshly minted engineer, teaching him the ins and outs of a then-popular commercial p-element code, and then sent him on his way to analyze a simple bracket that I’d previously worked on and was in production. Now, this code had an automatic meshing system that was, to be generous, a little wonky, and it would produce meshes that I considered error-prone despite the limits you would put on it; high angles, bad edge ratios, poor skewness and jacobian values, et cetera. The vendor who then owned the code assured users that this wasn’t a problem because it would just ramp up the p-order of the elements and this would “fix” the problem. :rolleyes: Anyway, Junior E does his analysis and comes back with massive stress risers all over the place which exceed the materal elastic limits by a factor of 4 or 5. He immediately calls up the engineer in charge of the product line to notify him, and a whole hubaloo goes on before I get involved. (I think I was out of the office when he did this, but whatever.)

Anyway, we go back and look at his model, and it turns out that every element in the high stress areas was badly shaped and had been ramped up to order 8 or 9, which means that all of the calculations of the mid-side nodes of the elements were wildly varying, with enormous strain energies; in short, it was all so much garbage. When we went back and ran the thing, restraining the edge orders to no more than order 4, it looked like silk, with nary a dramatic stress riser to be seen, so all of his high stress results were an artifact of a bad curve fit (and one that the software should have caught, but it didn’t–I’ve since become rather skeptical of that particular code). I then instructed him on the use of the manual meshing tools, as well as how to examine the all the outputs instead of just stress results before drawing any conclusions about the validity of the stress values. Oh, and then I had to go to a meeting and explain why finite element analysis is a art of approximation rather than a strict science, which caused the VP of engineering to decry all the money spent on “useless” analysis tools. Good times.

Stranger

So you were using the fit to generate a distribution of the deviations? Hmmm…I’d guess the variations aren’t much, so you’re not likely to get huge artificial peaks and troughs, but I’m not sure why you wouldn’t just use the data to get a Gaussian distribution directly. I’m not questioning your method; I’m just curious as to why you did it this way.

Stranger

<bolding mine>

Fitted polynomials, usually, and certainly more often than not, diverge wildly just beyond the data space used for the fitting. As a result, it is almost never a good idea to extrapolate.

Exceptions I would make:

When the underlying mechanism is known to be of the same order as the fitted polynomial, and the fitting was done only to smooth noisy data.

When the extrapolation is used only to suggest the direction for gathering additional data, so that validity of the extrapolation will be tested experimentally.

The problem with a high-order polynomial fit is that it’ll generally go crazy just outside of the range of the data points, or sometimes even in between the data points. When you’ve got 17th order terms, even just a tiny amount of noise can blow up into huge differences in the fitted curve.

Now, you will occasionally see high-order power law fits, but those include only the highest order, and so have very few parameters. So they’re still generally pretty stable (if a power law is what’s most appropriate for the data).

Wait, wait, wait.

I’m not asking about curve-fitting in general, or about polynomial fitting in particular.

I’m asking about the calculation method specifically of summing various integer powers of the variables, and then calculating the parameters of a fitted curve as functions of the sums.

Isn’t there a name for this approach per se?

Where might I find a list of curves and their parameters as functions of the sums?

No, you misunderstand – I’ve got a surface deviation which consists of a re;latyively small number of gentle rises and troughs. The 17th order polynomial fits those. A Gaussian is a single peak, and wouldn’t at all fit the observed surface form.

As a rule, an nth order polynomial can contain up to n-1 extrema (both peaks and valleys). I had a surface with about eight high points, and needed 17th order to make sure I had enough ups and downs to get all of them represented. It worked quite well.

i’ve never heard of a special name for it. My oldest book simply calls it “Least Squares Fit to a Polynomial”. That’s Phili[p Bevington;'s “Data Reduction and Error Analysis for the Physical Sciences”. Press et al in “Numerical Recipes” just call it Least-Square Fitting. You can call it Chi-Square fitting, because you’re minimizing the value of the deviation function Chi squared.
As for “A List of Curves and their Parameters”, I’m not sure what you mean. There are plenty of books giving the infinite (or finite) sun polynomial forms of various functions. Look at Abramowitz and Stegun, or Gradshteyn and Ryzhik, or the CRC Handbook.

Well, yeah, that’s why I qualified it with a “perhaps” and further statement as to the wildly varying nature of higher order polynomial functions. You can (often) make valid extrapolitions from linear fits, and (sometimes) from parabolics, provided with both that you have some confidence in the application of the fit to phenomena outside the range of existing data. Beyond that, I wouldn’t give much credence to any higher order fit unless the phenomena could analytically be described in terms of a high order polynomial, which is pretty rare in nature.

Napier, sorry about the hijack. I don’t have an answer for you, and I’m not sure I even understand your original question properly.

Stranger

Sounds like a trigonometric interpolation might have been better for this analysis.

This is one of the things that drives me nuts in my job. My coworkers, many of whom hold PhDs in physics, EE, etc…, insist on using high order polynomial interpolation without understanding how polynomials behave near the end data points or worse, outside the range of data points. This combined with their tendency to use 6th order polynomials to fit data that theoretically should be quadratic, Gaussian, or some other function (typically sinc^2) drives me nuts. It’s not that hard to do a nonlinear fit.

A normal distribution isn’t intended to fit the data onto a physical curve; it gives you a fit to a continuous probability distribution from which you can get information about the overall stochastic behavior of the system (standard deviation, variances, et cetera). I guess I’m assuming that you wanted to collect information for statistical process control or some other generalization of the data, but it sounds like you wanted a representation of this specific surface, so maybe we’re talking cross purposes. At any rate, I’ll drop it as it’s a complete tangent to the question posed by the o.p.

Stranger

“But if it fits my data set, it must be more accurate, right?”

Actually, what I love is when people start talking about confidence intervals with single data sets of population 3. “You realize there’s no way to way to calculate a confidence interval from this, right?” “Well, see, you just multiply the the estimated probability by some number I just pulled out of my ass, and then wave the standard deviation around until you get dizzy, and it all works out.”

Stranger

These are (as others have said) probably least-squares/Chi-squared fits to a polynomial fit equation–for a Cauchy distribution, it can be taken to be a polynomial (in {x,1/y}) of the form a+bx+cx[sup]2[/sup]=1/y. In particular, these sums are coefficients of the normal equations A[sup]T[/sup]Ax=*A[sup]T[/sup]*b (derived by taking partial derivatives to minimize the squared error) for this polynomial fit.

To get the equations for the regression coefficients, you just compute the symbolic matrix inversion x=(A[sup]T[/sup]A)[sup]-1[/sup]*A[sup]T[/sup]*b in your normal-equation coefficients. (Note that symbolic matrix inversion rapidly gets ugly; a numerical matrix inversion or matrix solution is much faster when you have lots of coefficients, so there’s little point to deriving the general form.)

Do you mean something like this; are you seeking a reference giving small-integer-coefficient fits for common functions? The rationale given in the link is minimization of keystrokes on a handheld device, and I could see how it would be handy for exponential integrals and the like.

Or do you have a function/data set that you want to explore with some technique more elegant than brute-force double precision least-squares or spline fitting?

Or is there really a way to get information about a function/data set (like skew or higher moments) by looking at sums over small integer powers? I’d find that fascinating and would love to hear more about it (I played around with your example for a while, but didn’t see anything arresting, though I may be missing something by not understanding what you mean by “the sum of 1 for all points being N.”)

OK, let me see if I can explain this. I am NOT trying to explain y(x) by creating a polynomial in x whose value is close to observations of y.

In my present example, I am more specifically trying to fit a Lorentzian (sp?) peak function to a spectrum measured by an instrument. The spectrum as it is reported has a peak in it, that looks like a bell curve. I would like to fit a Lorentzian to that, because it is much faster to calculate than a Gaussian and would produce equivalently valuable results. So I find that if I accumulate the sums I described, including a sum that is equal to the number of observations (that is the sum of 1 and is usually written as N rather than SIGMA 1), I can estimate the amplitude of the Lorentzian as one function of the sums, the width parameter as another function, the centroid (where in x is the peak?) as a third function, and the coefficient of determination as a fourth function. I don’t need any of the original data to do these estimations. The functions I refer to are complicated but reduce to rationals if I have several layers of substitution. I didn’t invent this method, I found it in the book cited.

Maybe I should have explained one thing more carefully:

The peak in the raw data is not the same thing as the dispersion of multiple attempts to measure a single parameter. Don’t think of a histogram of imperfect measurements of one absolutely accurate but unknowable number. If I had a magical way of making the peak narrower, it would just spoil the whole thing, not give me some more accurate result.

Rather, the physical process that generates the spectrum involves various phenomena that contribute to peak broadening. These phenomena are part of what is interesting about the physical system. The peak is not wide because something is wrong and imperfect - it’s wide because of the very thing I want to study in the end.