 # Linear regression: r vs. R^2

r gives you the strength of the correlation which I guess is an empirical sort of measure: weak, strong or neither while R^2 give you a quantitative result such as R^2 = 0.75 means the 75% of the variation is y is accounted for by the variations in x. I an having a hard time wrapping my head around that concept.

Assuming a simple linear model using Pearson’s r, it seems to me that r ~ s(residuals) but if that’s the case then what is the interpretation of variance s^2(residuals) and why would that give me something along the lines of var(y|var(x)) = R^2 * var(x) or the variance of y given the variance of x = R^2 * variance of x. If R^2 = 0.75, is it a reasonable interpretation that 25% of the variance [and is that s^2(residuals)?] in y is independent of x?

Lastly, let’s say I’m x=net calories eaten vs y=weight gain and I get R^2 = 0.6. What is the interpretation of that? 40% of weight gain/loss is independent of calories? Or is there no real-life interpretation?

I think the easiest way to think of it is R[sup]2[/sup] = 1-var(residuals)/var(y) so that R-squared is that fraction of the original variation explained.

As for your example, more calories might explain all of the weight gain, though some of the calories are needed for just maintenance so relation won’t go through the origin. Also the relation might not be linear so even if all the weight gain is explained by calorie intake, the R-squared would be less than one as there would be residuals around the regression line.

There is no real life interpretation, it is more of an error margin.
"Its giving you this value with 40% error " . … which can be either + or -
But in statistics world, they use a benchmark for a good correlation … eg 0.9 or better.

depends on the topic… perhaps you are looking for the best and then wondering if thats as good as you can do.

The residuals are accounted for by variables other than x (and so, yes, are independent of x). Those other variables could be measurement noise, or just variables you didn’t keep track of.

In your weight-loss example, 60% of your weight loss is accounted for by reduced calorie intake. The remaining 40% may be due to increased physical activity. Or maybe 20% is accounted for by increased physical activity, and another 20% by a prolonged period of illness Or reduced salt intake, resulting in less water retention. Or a crappy scale. Or a bunch of other factors that are not reduced calorie intake.

I’ll point out again, the residuals are not necessarily independent of x, they are only uncorrelated with x as this is a linear relation. There might be a perfect quadratic fit between y and x, for example, so that there is no other variable (other than x[sup]2[/sup] of course, that matters. the R-squared less than one simply means that y is not perfectly linear in x.

No. 1 - R^2 is the ratio of the variance of the residuals to the variance of the original data. It has nothing to do with a measurement of error.

Just so you know, R and it’s squared form are both equally quantitative as well as equally empirical. In other words, this first sentence of your OP makes no sense.