Sorry it took so long; here goes:

Your questions seem to center around the R[sup]2[/sup], p-values and F values, and what they show about “correctness” of a model. I’ll try to tackle each in turn.

R[sup]2[/sup] - This is the proportion of Variance in the outcome variable that is explained by the regression equation model. It’s good if it’s a big number (close to 1). As you alluded in your post, having a reasonable model that only accounts for a teeny bit of the variance is not that super.

p-value and F – This is where things get interesting. Statistics is necessary because we are unable to measure everyone in a specific population and must only sample that population. Now, we have to wonder if the results from our sample are truly representative of that population. This is the crux of Significance Testing and Confidence intervals.

<skip part about sum of squared residuals and F ratio calculation.>

In regression, we draw an hypothetical line to “predict” the outcome variable from a host of predictors. The line does not fit perfectly, so we have to measure how well it fits. The line is drawn to have the best fit. The fit is best when the “sum of squares” is as small as possible. (let’s leave it at that for now)

Suppose we took many many many samples from the population and ran a regression equation for each. Now lets suppose that we took the measure of best fit to calculate a “F ratio”. If we calculate F ratios (from those sums of squares) and plot them for each sample, we get a distribution of those values that looks like (if coding works) this:

```
c | *
| * *
o | * *
| * *
u | * *
| * *
n | * *
|* *
t |* *
|*_______________________________*____
0 ~50
F-ratio
```

It’s called the “F-distribution”, since statisticians are so darned creative.

Since we take 1 sample (or very few) and find one set of values, we are finding one point in the distribution. The question now becomes:

“Is this value an expected value or is it a rarity?”

The important part is if the value is a rare event. The question can be seen as:

“Is my value so far that it is from a different distribution or is it just extreme?”

We want to see if our value is representative of the population or due to chance. We often set the “chance value” (alpha) to .05 (for the 95% Confidence Interval – 95% of the time our value will be within this interval). This means that our value should be within 95% or the distribution. The “critical” F-ratio value cuts the distribution into 95% and 5%. It is shown below:

```
c | *
| * *
o | * *
| * *
u | * * |
| * * |
n | * * |
|* * |
t |* | *
|*_____________________|__________*____
0 | ~50
F-ratio F[sub]crit[/sub]
```

The actual value of F[sub]crit[/sub] depends on the degrees of freedom (df) in the calculation of the ratio (I won’t explain df here, unless you want me to).

If we find a value of F for our sample that is **greater** than F[sub]crit[/sub], our F value lies outside of the distribution. (There is less than a 5% chance that our sample is representative of the population). Because we are testing a **null hypothesis**, which we want to reject, we want to find an F value larger than the critical value.

The answer to your second question about “how large is large? What’s large enough?” with respect to the F value depends on the degrees of freedom. As a general rule, 10 or better is encouraging.

Since the F[sub]crit[/sub] changes with each set of df and each distribution, it’s kind of tough to check each time. Most software packages do it for you and express the result as a p-value.

p-value – the proportion of the distribution *to the extreme* of your test value. Since you have alpha = .05, we want our F ratio to have less than 5% of the distribution to the right ( < 5% “error”). This is why alpha values less than .05 are good. If alpha is less than .05, less than 5% of the distribution is extreme to your value, i.e.; your value is within your 95% CI.

p-values are good for testing models. The p-values for individual predictors show the “worth” of that predictor in the model; you can consider removing a predictor if the p-value > .05 (if theory permits).

If you need more, let me know.