endogeneity problem vs. mediating variable

Is there a difference between stating that in a statistical model one variable mediates the impact of another variable (perhaps not completely) on the outcome vs. stating that there is an “endogeneity problem” in the model?

I ask as I’m wondering if these are the same problem (potentially at least) just expressed differently in different disciplines or jargon. It seems that solutions to the mediating variable problem are explained quiet differently than to the “endogeneity problem” but this might also be an example of how the issue is treated by sociologists/public health writers vs. economists (and their ilk).

Thanks.

I would say that there is a difference. An endogeneity problem refers to any left-over variation in the dependent variable that is not explained by variables in the model, and that might conceivably be explained by a variable not included in the model. Of course, this extra variable might impact the effect sizes of the variables already in the model but this need not be the case. A mediating variable, the way I understand it, refers to a specific model (possibly with theoretical underpinnings) in which the effect of one independent variable on the dependent variable is contingent upon the level of another independent variable. For instance, you might specify a model in which income affects the likelihood that someone is going to vote, but only in women and not in men - thus the variable ‘gender’ mediates the effect of income on the likelihood to vote.

Of course, if you did not include the variable gender, you would find a much smaller effect for income. This means you do have an endogeneity problem - you’re missing an important variable - but including a mediating variable is only one possible solution to the endogeneity problem.

Hope this helps

Hmm. Is that what an endogeniety problem is? I think it’s when a dep. variable influences another dep. variable.

Hoo boy. Every time I think about endogeneity, I need to review my notes. And I’ve never read a treatment of mediating variables (is that part of Biostatistics?).

Caveats stated, I am guessing that they are different.

Say in reality A ->causes-> X ->causes-> Y. That would make X a mediating variable, right? (Or not?) If so, this is different from endogeneity.

Endogeneity occurs when you’re taking a system that essentially involves multiple equations, and just estimating it with one. Put another way running these two regressions involves no endogeneity problem:

Y = X + ep1
Y = A + ep2

Here’s an an example of endogeneity:

C = b1 + b2*Y
Y = C + I

b1 and b2 are parameters. Consumption is an affine function of income: higher income people spend more. But income is total spending – which in this model is consumption plus investment! If you simplify this into 1 equation and run OLS, the coefficient estimates are likely to be biased.

C = b1 + b2*Y + e

Estimate this with OLS and b2 will be biased, because Y is correlated with e.

change e - > changes C (via C=b1+b2Y+e) -> changes Y (via Y=C+I) -> changes C, etc.

Cite: !Powerpoint! : www.ucd.ie/economics/staff/ldelaney/Lecture%203.ppt

I think your right, it means that the variable (presumably the independent variable) is dependent on some of the others. It is mostly used in economic (formal) models though - hence the name endogenuous variable - in statistical models this problem is called multicollinearity and isn’t too much of a problem; the main effect is that it inflates the standard errors which leads to less significant relationships. You can check for this by letting your statistical program look vor the VIF (variance inflation index) and if it is too high (different sources claim either 4 or 10 is too high) you can either mention it as anaside or try to find an instrumental variable, which is a variable that is highly correlated with the one that is influenced by the iother independent variables, but not with the other independent variables. Must warn that this is mostly a trick used by econometrists to get around this problem and it si questionable whether this is a good idea if you are testing a theory (and thus use data that is based on the theory).

Good luck!!

In economics multicollinearity is an entirely different problem - and one which as you say isn’t especially serious. OLS will still be unbiased and a best linear estimator, though the variances of the coefficients on the collinear variables will grow.

So: independent variables correlated with one another: manageable.
Independent variable correlated with the error term: consider instrumental variables or a simultaneous equation system.