Chi-square test used in fitting two sets of data

starryspice · November 14, 2006, 3:38pm

I am a research assistant and I have this problem (not homework) I’m working on that I was hoping I could get some help with.

Here’s the basic set-up of the problem: I have two sets of data, one we’ll call the source spectrum and the other we’ll call the template spectrum. Each set of data has its own set of errors. I am modifying the template spectrum and then normalizing it to the source spectrum. Then I calculate the chi-square goodness-of-fit test. This is done in a loop so as to minimize the chi-square test.

I calculate the chi-square by dividing up the data sets into 50 bins. For each bin, I average the source flux (S) and the template flux (T).

Then chi-square = sum [(S - T)^2 / (total error)^2 ]

The problem I am having is in calculating the total error. When I look this up online, it is usually assumed that a data set is being set to a model or a function, which has no error. Furthermore, I haven’t been able to find anything that will tell me how to combine the errors in each bin.

So the questions are:

Do I just propagate the error of each point in a bin through the calculation of the mean to get the source/template error?
(i.e. source error for a single bin = sum [(source error)^2 / N^2]?)
Do I add the source error and the template error in quadrature to get the total error?

I hope this is intelligible and not too long. Thank you very much to anyone who is able to help me with this problem!

Chronos · November 14, 2006, 9:01pm

I presume you would add them in quadrature, presuming that the errors in both are independant (if they’re not independant, then you have to know exactly how they’re related), since that’s how independant errors always add. But I’m not familiar with the subtleties of this particular sort of problem.

Omphaloskeptic · November 14, 2006, 9:31pm

starryspice:

I calculate the chi-square by dividing up the data sets into 50 bins. For each bin, I average the source flux (S) and the template flux (T).

Then chi-square = sum [(S - T)^2 / (total error)^2 ]

The problem I am having is in calculating the total error. When I look this up online, it is usually assumed that a data set is being set to a model or a function, which has no error. Furthermore, I haven’t been able to find anything that will tell me how to combine the errors in each bin.

So the questions are:

Do I just propagate the error of each point in a bin through the calculation of the mean to get the source/template error?
(i.e. source error for a single bin = sum [(source error)^2 / N^2]?)

Do I add the source error and the template error in quadrature to get the total error?

Recall that the chi-squared test assumes that you have independent samples from a unit-variance Gaussian distribution. Let’s assume that your data points all follow [approximately] Gaussian distributions. The random variables S and T are then averages of Gaussians, so they’re Gaussian too, with variances given by the average of the bin variances–i.e., errors add in quadrature. (Your formula above seems to have an extra factor of N in the denominator, and a missing square root; it should be err(S)[sup]2[/sup] = sum[err(S[sub]i[/sub])[sup]2[/sup]] / N.)

Now (S-T) is the difference of two Gaussians, so it’s a Gaussian too; and its error is the quadrature sum of the errors of S and T. (In the case where T is known exactly, this reduces to just the error in S, as expected.)

So the short answers are Yes and Yes; the caveats are that this assumes independent samples (not at all obvious, especially if the data has been oversampled, smoothed, or similarly processed) and that everything is Gaussian (usually not too far wrong if the errors are relatively small).

Pasta · November 15, 2006, 8:46am

Could you elaborate on the following statement:

Are the sources of error purely statistical and uncorrelated from bin-to-bin? When you “modify the template”, depending on the details, the template could acquire bin-to-bin correlations, possibly requiring a generalization of your chi-square to include covariances instead of just variances.

Also, if the two spectra will always have identical areas because you normalize one to the other, don’t forget to account for the loss of an extra degree of freedom. You’ll have (50) - (number of template parameters being adjusted) - (one for the normalization constraint) degrees of freedom.

Topic		Replies	Views
[Statistics] Combine two RMS values with different means Factual Questions	10	19879	September 27, 2010
Statisticians, Help Me!!!! Factual Questions	1	566	April 16, 2004
Probability question Factual Questions	7	972	September 28, 2006
Weighted Standard Error of the Mean(desperate for help) Factual Questions	6	9054	January 18, 2012
Math Question Related to Adding Error in Quadrature Factual Questions	2	979	September 17, 2003

Chi-square test used in fitting two sets of data

Related topics