Function of a curve from a graph

Is it possible to draw a curve graph in Excel and from it deduce the function that defines the graph?

If Excel is not the software to do it, what should I be looking at (preferably open-source).

Excel has a very limited number of functions it can fit. Only 4 or 5 from memory. I only ever use it for linear but there are a few other.

If you only need a one off you can download trial versions of programs such as SPSS or Origin that will work for a month. They are both relatively user friendly (relative to other stats packages, which is setting a low standard) and will each fit dozens of more curves and determine the best fit. You also have forcing options for the curves, so you can force it through specific points or restrict the x and y limits etc.

As Blake said, Excel will construct simple functions for you. Type in your data as (x, y) pairs, and graph them up as a scatter plot. Then insert a “trendline” into the graph; Excel gives you a handful of options: linear, polynomix, exponential, logarithmic, power function. You can also display both the function and the R[sup]2[/sup] value, but beware: Excel seems to like using a minimum of significant figures, which is aggravating for higher-order polynomials.

If you right-click on the equation and select “Format data labels” you can change the constants to scientific notation with additional sig figs under the “Number” tab. My apologies if you are aware of this and merely meant it is annoying to fix it.

Oho! I didn’t know that, actually, so thanks for the info.

If you literally want to use the curve graph (a graph of a curve, I mean), you can use a program called WinDig to digitize the graphic. It follows along pixel by pixel and outputs a text file. I think it is freeware but am not sure.

It should be noted that there are two different operations one might be doing, in “converting a curve to a function”. If your initial curve consists of some sort of collected data, which you believe represents a pattern of some sort combined with “noise” (which may of course simply be a more complicated pattern that you don’t understand), then what you probably want is a fit to the data. In this case, you already have a formula for the pattern you expect to see, and you’re adjusting the values for some unknown parameters in the formula to make it match the data as closely as possible. For instance, if you have reason to believe that your data points should lie approximately along a straight line, then you’re fitting the formula for a line, y = mx + b, where m (the slope) and b (the y intercept) are parameters of the line that you don’t know in advance.

A best-fit curve does not need to exactly pass through all of the data points (the noise might have moved points away from where they “should” be), and in fact may not pass through any of them. If you do want to pass exactly through all of your data points (if, for instance, you know the values of the points extremely well, but don’t have a good idea of what kind of formula should describe the graph), then you want to use a process called interpolation, which basically just connects the dots. The simplest form of interpolation is linear interpolation, and consists of just drawing a straight line between every pair of points. If you want a smoother curve, you can use something called cubic spline interpolation, which draws a segment of a cubic (third-order polynomial) curve between adjacent points, in such a way that the different segments join together smoothly at the data points. Effectively, this gives you a different formula for every segment between points. Other interpolation methods are possible, but these two are the simplest, and good enough for most applications, so you seldom see any others.

If you’re trying to estimate a function given noisy observations, it’s best to stick to relatively simple functions unless you have a very large number of observations. Otherwise, you won’t have enough information in your dataset to distinguish between complex models, and you can end up with something that predicts the observations you know very well, but completely fails at new observations.

The classic example of this is that, given N data points, you can always find a polynomial of order N-1 that exactly passes through all of the data points, but that polynomial will almost always be a terrible mess, swinging around wildly everywhere in between the data points. For a good fit, the number of parameters in your fitting function should always be significantly smaller than your number of data points.

Isn’t that basically Nyquist frequency, but going in the other direction? In other words, Nyquist says that to sample a frequency, you need to sample twice as fast as the frequency. In this example, to plot a function, we want our function to have a factor (or more likely, a magnitude or more) less curves than we have sample points.

Exactly. For the record, linear interpolation and cubic splines are really terrible unless you have data without noise.

There’s very likely some relation, although I’m not familiar enough with the underlying information theory to say exactly what it is.

It depends on the data. If my data points are {(0.000 , 1.000), (1.000 , 3.000), (2.000 , 7.000), (3.000 , 13.000)} , then I’m going to be pretty confident that my relationship is y = x[sup]2[/sup] + x + 1, even though I barely have more data points than the order of my polynomial, since that polynomial fits the data points so incredibly well. The usual heuristic is to take some measure of the goodness of your fit (such as the R value), and then divide that by the number of degrees of freedom in your fitting function: This penalizes overly-complicated functions, but still allows them if they match the data well enough.

There are other, better techniques for model selection, but something like this is probably fine in most cases.

This may be too complicated for what you want…

Simfit is a free software OpenSource Windows/Linux package for simulation, curve fitting, statistics, and plotting, using a library of models or user-defined equations.

http://www.simfit.man.ac.uk/

Disclaimer: I actually know Bill Bardsley but don’t ask me how to use the software, I only built his computer.

A sort of write up on the software

http://www.walkingrandomly.com/?cat=29

lurcio

Thanks everyone for your suggestions! Just to clarify, this isn’t for statistics study (though I sure will remember this thread if I am doing any). The data points are not observed sample, but pre-determined numbers.

The purpose of this is I am trying to produce a graph that have diminishing return (or steep growth) for a game (maybe I should have put this up front, sorry about it!) - for example a skill value of 10 has an effect of 10, but a skill value at 20 has an effect of 25 - so increment of effect is not a linear graph but a curve.

So my idea was to plot key points in Excel and from it generate a generic function because this ultimately going to be in a computer program. So rather than coding a look-up table, I would like to find the function instead.

It sounds like a power or exponential function is what you need. Try fitting those to your data in Excel. In case you are interested, both of the following functions will fit the two data points you mentioned, and give you an increasing slope (i.e. an upward curving line):

y = 0.4765 * x + 1.3219
y = 4 * e^(0.0916 * x)

where, in both cases, y = effect and x = skill value

Happy gaming! :slight_smile:

Often, I think, the simplest function would be the easiest to use.

For example, try this:

y = A + B*x^C

Where A would be the base effect (which might often be zero), B is the scaling, and C is the shape of the curve. C between 0 and 1 will give you diminishing returns, C = 1 is a linear function, and C over 1 will give increasing returns. Tweak C one way or another until the curve looks right to you.

If you want nice integers, then add the INT function:

y = INT(A + B*x^C)

Here, you might want A to be at least 0.5, as the INT function chops off the trailing decimal places rather than rounding.

As others have said, be wary of interpolating (estimating values between data points) data from a fitted curve. Don’t even think about extrapolating (estimating values above or below the data set) from such.

Hi all, back with another math question :slight_smile:

Let’s say I want to generate a range of number from 1 to 100, but with the probability of getting a region more (say 40 to 50). How do I go about doing this? (I have only learned what is a normal distribution; but never learned the maths behind it).

Excel has an actual normal distribution curve you could use. From here, you can use

=NORMINV(RAND(), mean, standard_dev)

with a mean of 50 and a standard deviation of whatever you feel is appropriate. Be aware, though, thet the tails on a bell curve are infinitely long, so you’ll have to chop off things less than 0 and greater than 100. ETA: I doubt the imprecision discussed on that page concerns you.)

If you want something quick and dirtyto increase probability in one area, just create a random number from, sat, 0 to 110 and remap the numbers 100-110 to the range 40-50 with an IF statement (i.e., IF(A1>100,A1-60,A1))