Well hold on to your hat, because I’m going to present some more multivariate work. That’s stat-speak for “More than 2 explanatory variables”.
We looked at bigfoot sightings above and saw they were related to population. Then we looked at the scatterplot. Washington State was a big outlier and we think we know what was going on: that’s where Mr. Foot’s legend began. That outlier is going to exert a big effect on measuring the other variables’ effect on bigfoot sightings. So let’s take it out.
In fact, let’s take at all states in the Pacific Northwest. That will tell us where Bigfoot prefers to go on vacation, a well defined problem. I could simply drop them from a sample, but if I construct a variable for each NW state -Washington, Oregon and California- we’ll be able to see the effects of each. So let’s do that:
. reg bigfoot pop percap area Washington Oregon California
Source | SS df MS Number of obs = 50
-------------+------------------------------ F( 6, 43) = 55.68
Model | 533279.453 6 88879.9089 Prob > F = 0.0000
Residual | 68641.1267 43 1596.30527 R-squared = 0.8860
-------------+------------------------------ Adj R-squared = 0.8701
Total | 601920.58 49 12284.0935 Root MSE = 39.954
------------------------------------------------------------------------------
bigfoot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pop | .0000114 1.24e-06 9.22 0.000 8.92e-06 .0000139
percap | -.0037421 .0011204 -3.34 0.002 -.0060016 -.0014825
area | 7.91e-06 .0000657 0.12 0.905 -.0001246 .0001405
Washington | 513.566 40.54832 12.67 0.000 431.7925 595.3395
Oregon | 179.8493 40.4818 4.44 0.000 98.20997 261.4886
California | 38.62057 54.10497 0.71 0.479 -70.49249 147.7336
_cons | 148.6066 40.74161 3.65 0.001 66.4433 230.7699
------------------------------------------------------------------------------
There’s a lot of gobbledygook above, but I want the reader to focus on the P>|t| column: if it’s less than .05, that means the variable is statistically significant (at the 5% level). Population matters a lot. Washington State is a huge outlier and Oregon is pretty big as well. California, not so much - insignificant.
But the more interesting variable in per capita income - states outside of the NW with higher income per person (after controlling for state population) tend to have fewer bigfoot sightings. The coefficient (size of the effect) is negative and the t-stat/P-value indicate statistical significance. Mr Foot is a man of the people!
Now I can present a little mischievousness. I know that Red States tend to be poorer than blue states on average. So it wouldn’t surprise me if there’s a (spurious) relationship between the Romney share of the vote and bigfoot sightings, after controlling for population. And sure enough, there is:
. reg bigfoot pop romney area Washington Oregon California
Source | SS df MS Number of obs = 50
-------------+------------------------------ F( 6, 43) = 49.87
Model | 526290.819 6 87715.1366 Prob > F = 0.0000
Residual | 75629.7606 43 1758.83164 R-squared = 0.8744
-------------+------------------------------ Adj R-squared = 0.8568
Total | 601920.58 49 12284.0935 Root MSE = 41.938
------------------------------------------------------------------------------
bigfoot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pop | .0000112 1.30e-06 8.63 0.000 8.56e-06 .0000138
romney | 158.3455 63.8499 2.48 0.017 29.5799 287.1111
area | -.0000256 .0000716 -0.36 0.722 -.00017 .0001188
Washington | 515.5367 42.76579 12.05 0.000 429.2913 601.7821
Oregon | 199.0281 42.92973 4.64 0.000 112.452 285.6041
California | 49.75546 56.87845 0.87 0.387 -64.95087 164.4618
_cons | -63.10261 33.59612 -1.88 0.067 -130.8556 4.650425
------------------------------------------------------------------------------
Note that I removed out per capita income and replaced it with Romney vote. What’s happening is that the red state effect is confounded by the poorer state effect. Or maybe Bigfoot loves Romney! To sort out this pressing issue, we need to include all the relevant variables: otherwise we will have what’s known as omitted variable bias. Let’s correct the problem by including Romney and per capita income:
. reg bigfoot pop percap romney area Washington Oregon California
Source | SS df MS Number of obs = 50
-------------+------------------------------ F( 7, 42) = 47.88
Model | 534890.426 7 76412.918 Prob > F = 0.0000
Residual | 67030.1538 42 1595.95604 R-squared = 0.8886
-------------+------------------------------ Adj R-squared = 0.8701
Total | 601920.58 49 12284.0935 Root MSE = 39.949
------------------------------------------------------------------------------
bigfoot | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pop | .0000115 1.24e-06 9.27 0.000 9.02e-06 .000014
percap | -.0030513 .0013145 -2.32 0.025 -.005704 -.0003986
romney | 71.69758 71.36263 1.00 0.321 -72.31803 215.7132
area | -.0000114 .0000685 -0.17 0.869 -.0001496 .0001268
Washington | 517.6563 40.74777 12.70 0.000 435.4239 599.8886
Oregon | 187.5284 41.19269 4.55 0.000 104.3982 270.6586
California | 42.84906 54.26251 0.79 0.434 -66.65712 152.3552
_cons | 88.21014 72.61709 1.21 0.231 -58.33708 234.7574
------------------------------------------------------------------------------
Per person income and population remain statistically significant. Share of Romney vote drops to insignificance - though it’s still positive, possibly reflecting the bucolic nature of reddish states.