Least Squares and non-linear trendlines

WarmNPrickly · August 18, 2006, 10:06pm

I am currently trying to graph some kinetics data for a first order reaction. (First order means A --> B.) I have measured the concentrations of A and B at regular intervals as the reaction procedes. In general chemistry we were all taught that for a first order reaction plot the log of the concentration vs time to get a linear plot where the slope is the reaction rate. This makes sense to me.

Using Excel I ploted this line and calculated the slope. I showed this to one of the professors and he said that they no longer plot log concentration A vs time because it doesn’t weigh the points evenly. So I sent him the data and he plotted concentration A vs time and usiong a program called Logger Pro got an exponential equation with a reaction constant that was different from the one that I had calculated with excel.

Thinking I understood, I went back to excel plotted Concentration A vs time. When I calculated the line of best fit using the exponential model I got the same reaction constant I had when I did it linearly.

I think I need a better program for analyzing data, but I am not certain what I am looking for. I am not certain why two different programs come up with two different lines of best fit. Nor can I mathmatically make sense of the fact that graphing log concentration A vs time isn’t just as good as plotting Concentration A vs time.

Omphaloskeptic · August 18, 2006, 10:48pm

My guess would be that it’s a matter of different measurement uncertainties. If your measurements of concentration are all, say, to +/- 0.01M (i.e., all having the same constant error) then the errors in the logarithms will be different. (The larger x is, the smaller the difference log(x+0.01)-log(x) is.) So, as your professor said, measurements at larger concentrations (hence having smaller errors) should be more heavily weighted than measurements at lower concentrations. Excel is probably assuming equal uncertainties in the linearized least-squares solution; its “exponential” fit is probably just taking the logarithm internally, like you did manually the first time.

What you want is a fitting program that understands and accepts measurements with different errors. Then if you like you can fit log(x) vs. time (x your concentration) by fitting a line to your measured values of log(x), but now with errors calculated as above by computing the difference log(x+u)-log(x), where u is the measurement uncertainty. (Notice that if u<<x, you can come up with a good estimate of this value using differential calculus or Taylor expansions of log(x+u).)

If you know a little matrix algebra, I recommend that you learn about least-squares fitting. It’s a good thing to understand.

Topologist · August 18, 2006, 11:06pm

As Omphaloskeptic said (as I was typing my reply): It all has to do with exactly what sum-of-squared error (SSE) you’re minimizing. If your observed values are y[sub]i[/sub] and predicted values are z[sub]i[/sub], ideally you would like to minimize the sum of (y[sub]i[/sub] - z[sub]i[/sub])[sup]2[/sup]. However, if you plot the logs of values and fit a straight line, you are minimizing instead the sum of (log y[sub]i[/sub] - log z[sub]i[/sub])[sup]2[/sup]. This weights the errors differently, but is easier because there are nice formulas for linear regression. It turns out that Excel takes this way out when you ask it to for an exponential regression.

I’m sure there are programs available that do exponential regression the “right” way, but you can get Excel to do it with a bit of work. What you need to do is use Excel’s Solver tool. (If you don’t see Solver in the Tools menu, choose the Add-ins menu item to have Excel install it.) Here is a too-brief description of how to do it: Put the coefficients of your exponential function (the ones you’re trying to determine) in a couple of cells, and use these values to calculate in another cell the SSE. Then ask Solver to minimize the SSE by modifying the coefficients. You’ll need to put in reasonable values of the coefficients as an initial guess.

WarmNPrickly · August 19, 2006, 12:53am

I just reviewed the basics of least squares fitting and I’ve figured out that my matrix algebra is weak. I understand that we are minimizing SUM((f(x) - y)^2). Excell is doing its “exponential” minimization by SUM((f(x)-logy)^2) wich weights the error unevenly. I need a program that will minimize SUM((f(x)-y)^2) by manipulating the unknowns a,b,c in f(x) = AEXP(bx) +c.

Is this the right way to do it, or just the best way?

Omphaloskeptic · August 19, 2006, 3:05am

Is that “c” really there, or is it just y=a e[sup]bx[/sup]? If there really is a nonzero steady-state value, then taking log(y) is probably not what you wanted to do; log(y) is not a linear function of x unless c is zero.

But in general, assuming that the measurement uncertainties are in y alone, and are the same absolute size for each measurement, then this technique will certainly give you the exact minimum-squared-error for the given parametric curve. If the uncertainties are not the same for each measurement, then what you want to minimize is the weighted sum of each squared difference, SUM(((y[sub]i[/sub]-f(x[sub]i[/sub]))/d[sub]i[/sub])[sup]2[/sup]) where d[sub]i[/sub] is the uncertainty in measurement i.

However, f(x) is not a linear function of the coefficients to be fit; this makes performing the minimization somewhat tricky in practice (depending a lot on just how ugly f is). The usual matrix technique is to try to manipulate the curve into a form g(y)=f(x)=a[sub]1[/sub]f[sub]1/sub+…+a[sub]m[/sub]f[sub]m/sub which is linear in the coefficients a[sub]i[/sub] which we want to find. (For example, this includes polynomial fits, with f[sub]n/sub=x[sup]n-1[/sup] and g(y)=y; it also includes your fit, with g(y)=log(y), as long as c=0.) In doing so, the measurement uncertainties change from d[sub]i[/sub] to approximately e[sub]i[/sub]=g(y[sub]i[/sub]+d[sub]i[/sub])-g(y[sub]i[/sub]), so the new goal is to minimize SUM(((g(y[sub]i[/sub])-f(x[sub]i[/sub]))/e[sub]i[/sub])[sup]2[/sup]), the sum weighted by the size of each uncertainty. This can be written as a matrix equation and solved efficiently.

WarmNPrickly · August 22, 2006, 5:11pm

Yes, I beleive the C is in there. I measure my concentrations by integrating the peaks in an NMR spectrum. If the computer that measures the integration does not find the baseline correctly, I can actually get “negative” concentration values at low concentration. It is not too far off from zero, but Excell pukes at those numbers so I just don’t include them.

Omphaloskeptic · August 22, 2006, 11:02pm

If C is nonnegligible, the parameter optimization is inherently nonlinear. While you still could use a matrix least-squares linearization, you would have to check to make sure your linear approximations are valid. It’s probably better to use a more general solution technique like the Excel Solver that Topologist recommends, or make your own. (You could implement a gradient-descent or similar method pretty easily, since you have a relatively simple functional form.)

Topic		Replies	Views
Statistics (Functions) Factual Questions	18	1252	October 25, 2000
Fitting an exponential decay in Excel, with non-zero asymptote Factual Questions	9	26599	January 7, 2013
Excel question: Drawing a best-fit curve Factual Questions	4	5183	April 14, 2002
MS Excel Q: Errors in Linear Regression? Factual Questions	0	678	February 4, 2008
Casio Calculators & Linear Regression Factual Questions	2	1649	March 11, 2003

Least Squares and non-linear trendlines

Related topics