Chi-square test used in fitting two sets of data

I am a research assistant and I have this problem (not homework) I’m working on that I was hoping I could get some help with.

Here’s the basic set-up of the problem: I have two sets of data, one we’ll call the source spectrum and the other we’ll call the template spectrum. Each set of data has its own set of errors. I am modifying the template spectrum and then normalizing it to the source spectrum. Then I calculate the chi-square goodness-of-fit test. This is done in a loop so as to minimize the chi-square test.

I calculate the chi-square by dividing up the data sets into 50 bins. For each bin, I average the source flux (S) and the template flux (T).

Then chi-square = sum [(S - T)^2 / (total error)^2 ]

The problem I am having is in calculating the total error. When I look this up online, it is usually assumed that a data set is being set to a model or a function, which has no error. Furthermore, I haven’t been able to find anything that will tell me how to combine the errors in each bin.

So the questions are:

  1. Do I just propagate the error of each point in a bin through the calculation of the mean to get the source/template error?
    (i.e. source error for a single bin = sum [(source error)^2 / N^2]?)
  2. Do I add the source error and the template error in quadrature to get the total error?

I hope this is intelligible and not too long. Thank you very much to anyone who is able to help me with this problem!

I presume you would add them in quadrature, presuming that the errors in both are independant (if they’re not independant, then you have to know exactly how they’re related), since that’s how independant errors always add. But I’m not familiar with the subtleties of this particular sort of problem.

Recall that the chi-squared test assumes that you have independent samples from a unit-variance Gaussian distribution. Let’s assume that your data points all follow [approximately] Gaussian distributions. The random variables S and T are then averages of Gaussians, so they’re Gaussian too, with variances given by the average of the bin variances–i.e., errors add in quadrature. (Your formula above seems to have an extra factor of N in the denominator, and a missing square root; it should be err(S)[sup]2[/sup] = sum[err(S[sub]i[/sub])[sup]2[/sup]] / N.)

Now (S-T) is the difference of two Gaussians, so it’s a Gaussian too; and its error is the quadrature sum of the errors of S and T. (In the case where T is known exactly, this reduces to just the error in S, as expected.)

So the short answers are Yes and Yes; the caveats are that this assumes independent samples (not at all obvious, especially if the data has been oversampled, smoothed, or similarly processed) and that everything is Gaussian (usually not too far wrong if the errors are relatively small).

Could you elaborate on the following statement:

Are the sources of error purely statistical and uncorrelated from bin-to-bin? When you “modify the template”, depending on the details, the template could acquire bin-to-bin correlations, possibly requiring a generalization of your chi-square to include covariances instead of just variances.

Also, if the two spectra will always have identical areas because you normalize one to the other, don’t forget to account for the loss of an extra degree of freedom. You’ll have (50) - (number of template parameters being adjusted) - (one for the normalization constraint) degrees of freedom.