Testing a claim about a proportion

Please someone explain this to me. I have done this problem twice (with different values). I have no problem with the math part but I absolutely can’t understand the reasoning at the end.

These are the numbers and results from the second time I worked the problem. I got everything right up 'til the last part both times.

99 polygraph tests.
22 have wrong results.
77 have correct results.
.01 significance level.

Test the claim that such polygraph tests are correct less than 80% of the time.

Based on the results, should polygraph tests be prohibited as evidence in trials?

A) Identify the null and alternative hypotheses: Null: p=.8, ALT: p<.8 - correct.

B) Identify the test statistic: -.55 correct

C) Identify the p-value: .2912 correct

D) Identify the conclusion about the null hypothesis and the final conclusion that addresses the claim.

Fail to reject the null hypotheses - correct

There is not significant evidence to support the claim that polygraph tests are correct less than 80% of the time - correct.

This is where I have issues -

Based on the sample proportion of correct results, polygraph tests (do not) appear to have a high degree of reliability that would justify the use of a polygraph in court, so polygraph results should be (prohibited) as evidence in trials.

The parts in the parentheses are the correct answers and are the parts I can’t seem to get right. I failed to reject the null hypothesis so I don’t know whether or not they have less than an 80% correct rate. Is 80% considered bad? How the hell am I supposed to know? Is this some common knowledge? If I had rejected the null hypothesis, that would mean that they are correct less than 80% of the time - yes? But, I didn’t so how am I supposed to put 2 and 2 together to determine that they shouldn’t be allowed in court?

Moderator comment: congodwarf, you’ve posted this question in the forum dedicated to Chicago-related topics. Since it’s more a general question (math, statistics, etc), I’m moving it to the forum called “General Questions.”

Imagine a different problem. You are charged with describing a certain coin is fair. H0 = P(Heads) = .5. Flip a real life quarter a few times and you will likely get a value different than .5. Would a reasonable person then call the coin rigged? No. But a reasonable person would also not yet call te coin fair. They do not accept H0, because there is not enough evidence. They fail to accept it, because there is not enough evidence.
In your problem ‘enough evidence’ is determined by alpha. You say by a = .01 that ‘enough evidence’ means that there is a 1% or lower chance of your sample occuring if H0 is true, so it is probably false. I hope this helps.

Wow. That’s the second time I’ve done that this week. I’m sorry to make extra work for you Dex. Thanks for moving it.

jgatlin Thanks for the explanation. I’m sorry to say I’m still confused.
Based on the significance level that I was given, the null hypothesis was not rejected. Are you saying it’s a Type II error? Again, how do I know? I wasn’t given any criteria which stated the cutoff for a Type II error. Based on my significance level and p-value, I made the right decision. Also, according to the program, I made the right decision.

So, why did the second half flip and say basically, you’re right but you’re wrong?
I am going to feel really foolish when I eventually understand what the issue is, I know I am. In my defense, I spent 12 hours studying for this test yesterday and my brain is fizzled.

I may not see any follow up answers until after the test but I still appreciate them. I’m just going to hope I don’t have to make that kind of decision on the test.

I think you’re starting with the burden of proof on the wrong side. To accept a polygraph in court, you need positive evidence it works, rather than merely a lack of evidence that it doesn’t work.

Since, I think the only evidence you have is inconclusive either way, then you should not accept polygraphs.

Happy to be of help. I registered to try and answer this question. Hopefully I can be clearer this time.

Statistically, the null hypothesis is ‘Nothin to see here.’ It is presumed that polygraphs work as a standard statistics procedure. The burden of proof is on HA. We assume the coin is fair until we are shown otherwise. We can’t prove something is going on, so we assume nothing is.

If the polygraph is accurate 80% of the time, the polygraph’s results in your sample or worse will occur 29% of the time. Since the results are pretty reasonable (29% sounds good), your sample does not disprove the accuracy of polygraphs. If you swap it now and change your null hypothesis, that’s extrapolation (bad). So it isn’t fair to turn the test around and see if these results work for an unreliable polygraph. We are therefore stuck with these results.

Failing to reject the null hypothesis is a conclusion, but it is really no conclusion at all.

Wow, thank you both. Amazingly enough, I am actually seeing the reasoning now.
jgatlin - I’m honored that you registered just to help me out. Thank you!

Hmmmm… it sounds to me like the problem is poorly worded. Even if the statistical test tells you that polygraphs appear to be accurate less than 80% of the time, in order to be able to go from that information to a conclusion about what should be done (i.e., should polygraph results be used as evidence), there must be a criterion – a level of accuracy that has been declared “the minimum level acceptable for evidence” – and that’s not something that statistics can tell you; it’s a judgment call. Now, from the way you stated the problem, it may be that 80% is that criterion, but that’s not made explicit.

As suggested by the responders above, a high p-value should be interpreted as “I dunno”, not as “the alternative is false”. This study tells us nothing about whether the accuracy is above of below 80%.

I see this logic flaw come up all the time in medical papers. An investigator will say for example, the drug was found to be effective in disease type A (p=0.023) but not in disease type B (p=0.154), therefore there is a difference between type A and type B. In fact the difference could easily just be random noise.

The general posture is to treat the null as a baseline, and to consider evidence of the alternative against the null.

The simplest form here is: 80% correct rate versus <80% correct rate. That is, if our guess is an 80% success rate versus the alternative that 80% is too large a guess.

Under the alternative, we expect a success rate that is markedly less than 80%. Under the Null, we expect the observed success rate to be closer to 80%.

The test, therefore, is active when the observed success rate is markedly smaller than 80%. The p-value summarizes the extent of the smallness:
Pr{ worse result | P = .80 } =
Pr{ Z < (p-hat - .80)/sqrt(.8*.2/99) } =
Pr{ Z < ((77/99) - .80)/sqrt(.8*.2/99) } approx. 29%

The sample does not appear to present significant evidence against the null hypothesis of the form specified by the alternative.

By the by, the use of the term “accept” is potentially misleading - the test summarizes evidence of the form of the alternative against the null hypothesis. “Faill to reject the null” is more accurate.

A more informative approach involves confidence intervals, be they one or two-sided.