I have a friend who is taking a stats course in grad school. I took the same course moons ago, but I can’t remember my statistics well enough to answer her question.

She is running a multiple regression on a survey of job statisfaction. The variables are between 1 - 5. In running the regression, her group got a constant of -0.25. She wants to know if that’s possible, given that the variables are positive numbers. I know it’s possible, especially given that the constant is so close to zero, but I’m having a tough time with the reasoning.

Yes. The regression is determining the slope of the line and projecting where it would intercept the y axis. Since it is only a projection, and zero apparently isn’t a valid value, the intercept is coming up less than zero. At least the way I’m being taught, you can’t really do multiple regression (at least not without more know-how than I’m getting in this class) on a variable that is not continuous. So she may be violating an assumption by using a categorical variable 1-5.

The short answer: yes you can do multivariate regression with ordinal variables, and yes, the intercept south of the origin is certainly possible. If this makes no intuitive sense to your friend, she should be able to suppress it using her stats package. This will have no impact on her results.

Think of it this way: you’re creating a model Y = [symbol]B[/symbol][sub]0[/sub]X + [symbol]b[/symbol][sub]1[/sub] with values of [symbol]B[/symbol][sub]0[/sub] and [symbol]b[/symbol][sub]1[/sub] that minimize the sum of the squared error in predicting Y from X. In this particular case, [symbol]B[/symbol][sub]0[/sub]X overshoots Y by a bit, so you need to adjust downward with negative values in [symbol]b[/symbol][sub]1[/sub].

There are some issues related to doing regression on categorical data, but this isn’t one of them.

Sure, it’s possible as others have already replied. What your friend should do is to check for the statistical significance of the constant (intercept) term. Tell her to just check the associated p-value. If this value is very small, just exclude the intercept from the model. The software should have a simple option to do that.