Linear regression: r vs. R^2

Saint_Cad · February 11, 2014, 9:06pm

r gives you the strength of the correlation which I guess is an empirical sort of measure: weak, strong or neither while R^2 give you a quantitative result such as R^2 = 0.75 means the 75% of the variation is y is accounted for by the variations in x. I an having a hard time wrapping my head around that concept.

Assuming a simple linear model using Pearson’s r, it seems to me that r ~ s(residuals) but if that’s the case then what is the interpretation of variance s^2(residuals) and why would that give me something along the lines of var(y|var(x)) = R^2 * var(x) or the variance of y given the variance of x = R^2 * variance of x. If R^2 = 0.75, is it a reasonable interpretation that 25% of the variance [and is that s^2(residuals)?] in y is independent of x?

Lastly, let’s say I’m x=net calories eaten vs y=weight gain and I get R^2 = 0.6. What is the interpretation of that? 40% of weight gain/loss is independent of calories? Or is there no real-life interpretation?

OldGuy · February 11, 2014, 9:27pm

I think the easiest way to think of it is R[sup]2[/sup] = 1-var(residuals)/var(y) so that R-squared is that fraction of the original variation explained.

As for your example, more calories might explain all of the weight gain, though some of the calories are needed for just maintenance so relation won’t go through the origin. Also the relation might not be linear so even if all the weight gain is explained by calorie intake, the R-squared would be less than one as there would be residuals around the regression line.

Isilder · February 11, 2014, 9:28pm

There is no real life interpretation, it is more of an error margin.
"Its giving you this value with 40% error " . .. which can be either + or -
But in statistics world, they use a benchmark for a good correlation … eg 0.9 or better.

depends on the topic.. perhaps you are looking for the best and then wondering if thats as good as you can do.

Machine_Elf · February 11, 2014, 9:37pm

The residuals are accounted for by variables other than x (and so, yes, are independent of x). Those other variables could be measurement noise, or just variables you didn’t keep track of.

In your weight-loss example, 60% of your weight loss is accounted for by reduced calorie intake. The remaining 40% may be due to increased physical activity. Or maybe 20% is accounted for by increased physical activity, and another 20% by a prolonged period of illness Or reduced salt intake, resulting in less water retention. Or a crappy scale. Or a bunch of other factors that are not reduced calorie intake.

OldGuy · February 11, 2014, 11:25pm

I’ll point out again, the residuals are not necessarily independent of x, they are only uncorrelated with x as this is a linear relation. There might be a perfect quadratic fit between y and x, for example, so that there is no other variable (other than x[sup]2[/sup] of course, that matters. the R-squared less than one simply means that y is not perfectly linear in x.

ultrafilter · February 12, 2014, 3:20am

No. 1 - R^2 is the ratio of the variance of the residuals to the variance of the original data. It has nothing to do with a measurement of error.

Svejk_1 · February 12, 2014, 3:54am

Just so you know, R and it’s squared form are both equally quantitative as well as equally empirical. In other words, this first sentence of your OP makes no sense.

Topic		Replies	Views
Linear regression/correlation terminology? Factual Questions	9	2103	September 11, 2012
Statisticians: would this look reasonably smart, stupid, or simply nonsensical? In My Humble Opinion	2	1100	January 2, 2012
Wikipedia articles on math and statistics unhelpful to beginners In My Humble Opinion	33	7442	December 15, 2010
Statistics question--blatant request for homework help Factual Questions	13	1158	May 7, 2001
Stats Question Factual Questions	4	3400	October 27, 2012

Linear regression: r vs. R^2

Related topics