Pearson R coefficient. Help!

Angua · March 22, 2006, 3:13pm

I’m currently in the process of trying to establish the degree of correlation between two variables L and S, and I’ve been using the Pearson R coefficient. This has worked perfectly fine for L and T and L and N, (where T and N are other variables, which are different to S), and I’ve been getting sensible values.

However, when I try to establish the degree of correlation between L and S, something wierd happens, and I get a coefficient of 3.6! I’ve triple checked that I’ve done the calculation properly (I have done it properly), I’ve checked the formula I have for calculating the coefficient on data that is perfectly correlated (it works in that case), and I’ve done the calculation by hand to double check the spreadsheet (which matches what the spreadsheet says). I’ve also taken a look at the scatter plot of S versus L, and this is where this gets particularly interesting.

It turns out that there’s one datapoint that has a particularly high S and a particularly large L, i.e., its a total outlier. So, I’ve recalculated the coefficient without the outlier, and I end up with a perfectly reasonable correlation coefficient of 0.77. Now, could the outlier cause the meaningless coefficient I obtained initially, or is there something more afoot?

Thanks for any help anyone can shed on this, my advisor’s as confused as I am.

nivlac · March 22, 2006, 3:48pm

The Pearson’s correlation is always between -1 and 1, so if you’re getting anything outside that range, either (1) you performed the calculation incorrectly, or (2) whatever you used to do the calculation has a bug. You’ve apparently eliminated (1), so what tool are you using to do the calculation? It could be a problem with the way the tool is handling the outlier value. Note that even outliers cannot throw the correlation outside the valid range, if the formula has been properly coded and potential overflow situations are handled correctly. It might help if you post the values for L and S and the code that’s doing the calculation.

Angua · March 22, 2006, 4:04pm

I’m actually doing the calculation myself “by hand” in the spreadsheet. I get the spreadsheet to calculate Sum(L), Sum(S), Sum(LS), Sum(L[sup]2[/sup]) and Sum(S[sup]2[/sup]). Then I use the spreadsheet to calculate the coefficient. I realise now that this is probably a baaad idea, and I should just check to see if the spreadsheet will do it automatically.

I have just checked in fact, and the spreadsheet can actually calculate the Pearson R coefficients for me. Using the inbuilt tool (I’m using OpenOffice), it seems to give me a number that looks more correct. I think I may have been mis-coding or something.

Topic		Replies	Views
Math/statistics folks: please answer a question about Correlation Factual Questions	28	1516	October 2, 2007
Statisticians: would this look reasonably smart, stupid, or simply nonsensical? In My Humble Opinion	2	1100	January 2, 2012
Correlation and percent of error Factual Questions	5	167	February 24, 2025
Statistical Analysis Consultants Factual Questions	10	1051	June 8, 2013
Need help with a statistics problem Factual Questions	12	441	February 17, 2022

Pearson R coefficient. Help!

Related topics