Linear regression/correlation terminology?

Machine_Elf · September 8, 2012, 11:37pm

I’m having trouble expressing an idea in plain English. Here’s the technical details:

Suppose I have a set of data pairs, (X,Y), and I perform linear regression to determine a relationship:

Y = A[sub]1[/sub] * X + B[sub]1[/sub]
There is a correlation coefficient, R[sup]2[/sup], that goes along with this.

Now suppose I take some action to mitigate the effect of X on Y. I collect a new set of data pairs, repeat the regression, and establish a new relationship:

Y = A[sub]2[/sub] * X + B[sub]2[/sub]

There is again a correlation coefficient, but one that is not substantially different from the earlier one.

The important difference is that my mitigating action has resulted in A[sub]2[/sub] < A[sub]1[/sub].

How do I expres this change in plain English?

“The mitigating action has reduced the correlation between X and Y.” This seems wrong, since “correlation” would refer to R[sup]2[/sup], which could very well be the same in both cases.

“The mitigating action has reduced the association between X and Y.” Is “association” an accepted term for describing the slope of a linear regression like this?

“The mitigating action has reduced the sensitivity of Y to X.” A bit clumsy, but does this accurately express that a change in slope has occured?

Any other suggestions, or knowledge of how this sort of concept is expressed by statisticians?

polar_bear · September 8, 2012, 11:53pm

The fact that a1 and a2 are different doesn’t have anything to do with the strength of the correlation, but with the size of the effect. So it seems your ‘action’ just made the ‘spread’ between the observed numbers smaller.

R2 isn’t the correlation btw (r is), it’s the explained variance of your model. In essence it tells you how much of the difference in Y can be explained by X.

Senegoid · September 9, 2012, 12:35am

I suspect that this is the best of the three statements you’ve suggested. (We’re all assuming here that A[sub]1[/sub] and A[sub]2[/sub] are both positive, I gather.) The lesser slope of the A[sub]2[/sub] line says that Y is not as much affected by X as in the A[sub]1[/sub] case.

So maybe something like: The mitigating action has reduced the effect of X upon Y.
Or maybe: With the mitigating action, Y is less affected by X. (or: . . . by changes in X.)

ultrafilter · September 9, 2012, 4:10pm

You’re fitting the wrong model. Your data points are (x_i, y_i, 0) for the observations before your change, and (x_j, y_j, 1) for the observations afterward. The model you want to fit should be of the form y = ax + bz + c, where z is the indicator of whether the current observation is before or after the change. What you’re interested in is the estimate of c and whether it’s significant.

And as observed above, R^2 is the square of the correlation coefficient.

Edit: This sort of model is known in the literature as an analysis of covariance, or ANCOVA.

naita · September 9, 2012, 5:19pm

“The mitigating action has reduced the effect of X on Y”

Machine_Elf · September 9, 2012, 9:28pm

Agreed; the strength of the correlation is expressed in R (or R[sup]2[/sup]. So how do I express the change in size of the effect in plain English?

“The association between Y and X was mitigated by <mitigating action>”?

This is not accurate. The model you posit has the product bz as a static offset that gets added to the y-intercept. For the two models I posit:

Y = A[sub]1[/sub] * X + B[sub]1[/sub]
or
Y = A[sub]2[/sub] * X + B[sub]2[/sub]

the Y-intercepts B[sub]1[/sub] and B[sub]2[/sub] could conceivably be identical; I’m specifically interested in how to express, in words, the fact that A[sub]2[/sub] is less than A[sub]1[/sub] (assuming both are positive). Not in such mechanical/mathematical language as that, but more in a way that describes the change in the relationship between the dependent/independent variables.

Think of how you would present this idea in the discussion section of a manuscript you’re writing for publication; that’s what I’m after.

ultrafilter · September 10, 2012, 1:18am

OK, yes, the model should be y = ax + bxz + c (+ dz), not as above. The problem with the way you’re framing the model is that unless you have exactly the same set of x values for both sets of data, you can’t know whether whatever differences you’re observing in the estimates of a is due to the change you made or the difference in the x values. If you do fit an analysis of covariance model, which is absolutely bog-standard, you can just say that the interaction term was/was not significant. If your audience is at all statistically literate, they’ll know what this means.

Machine_Elf · September 10, 2012, 3:23am

You are focusing on extraneous details and completely ignoring the question that is of primary interest to me. If you come up with an answer to that question, please share.

Nametag · September 11, 2012, 12:59am

I agree that if you don’t want to call it “slope,” “sensitivity” is your best bet (as in sentence #3).

R^2 is called the coefficient of determination.

Maserschmidt · September 11, 2012, 1:05am

The mitigating action has dampened the impact of X on Y.

Topic		Replies	Views
Statistics question--blatant request for homework help Factual Questions	13	1159	May 7, 2001
Goddammit, learn the difference between correlation and causality!! The BBQ Pit	32	1984	July 29, 2002
Statisticians: would this look reasonably smart, stupid, or simply nonsensical? In My Humble Opinion	2	1103	January 2, 2012
Data transformation in statistics: Why is it valid? Factual Questions	7	8520	November 30, 2010
Linear regression: r vs. R^2 Factual Questions	6	1666	February 12, 2014

Linear regression/correlation terminology?

Related topics