Is there any formula to subtract a baseline from a signal?

When they don’t have the same X values?

I’m talking about if you have a signal like:
X Y
1 65
3 54
5 63
7 55
9 54

and a baseline like:
X Y
2 10
4 23
6 35
8 20
10 22

It would be relatively easy to linearly interpolate between each pair of baseline values to find the baseline Y for each signal X (e.g. for X = 1 the baseline should be _), I’m just wondering if this is the most accurate method? A more accurate method might be to interpolate from the baseline X closest to the signal X (e.g. if baseline were 1 and 4, and signal was 2, you’d calculate it from the 1, not the 4).

There are an infinite number of solutions possible here. So you need to start with the assumptions you have about the nature of the signal and the baseline - which is itself clearly a signal of some form.

In general you would create an interpolating function for both signal and baseline, and subtract one from the other at exactly the point you wished to find the final value - whether this was a signal sample point or not. The problem is that there are an infinite number of ways of creating an interpolating function. You limit this infinitude with your assumptions about the signal. Is it periodic? If so, maybe an FFT is a good idea. You could look at the whole raft of possible splines as another option. Simple cubic spline would be a good start.

If these are discrete samples from a continuous space, you can get into sampling and decimation algorithms that in their simplest form apply a low pass filter to the data and provide samples at intermediate points. A finite impulse response filter with a few taps would do. I suspect this is what you actually want.

After this you could look at heuristic methods to tweak the final results towards some defined “betterness” by tweaking the parameters to your interpolation mechanism. Again, you need some assumptions about the nature of the signal and baseline in order to defines a betterness property. If you did this things like maximum entropy could be used. This is probably well past where you need to be.

Francis Vaughan gives an excellent summary. Rightly or wrongly, an assumption is often made that the signal is band-width limited and sampled above its Nyquist rate; in such a case the Fourier method – in which the sinc function is used as the interpolation function – gives an exact interpolation.

A very popular method is to use a certain cubic spline as a finite approximation to the sinc function. In your example, only four points (x=2, x=4, x=6 and x=8) would be used to interpolate for 4 < x < 6. This is called “cubic interpolation.” A source of confusion is that “cubic spline interpolation” is a completely different (and much slower) method, though each is based on splines of cubics!

To add to the above, you also have to consider whether the baseline data points have exact values, or also have noise in their values. For example, maybe you know that the baseline data should be close to quadratic. Then you’d want to use a best fit quadratic as your baseline, even though it won’t go through any of the values exactly. If you used an interpolation that went through every point exactly, you’d just be fitting the noise.

You really need to take into account any knowledge of the signal you have to best fit/interpolate your data.

Hmm thanks for your replies. A quick question - Origin (software) is able to do this e.g. http://www.originlab.com/forum/topic.asp?TOPIC_ID=9190 Does anyone know what method it uses? Can you choose?

From this page it appears three interpolation methods are available. I’m not familiar with “cubic B-spline”; I’ll guess it’s related to “cubic spline interpolation.”