I am a pure mathematician teaching an applied math class. The book the class uses teaches probability using set theory. So the way it shows Bayian probability is by splitting a sample space into disjoint spaces S1 and S2 that cover the entire space. Thus
p(S1 given A) =
[p(S1) x p(A given S1)] / {[p(S1) x p(A given S1)] + [p(S2) x p(A given S2)]}

I understand how the formula is derived, but conceptually I’m at a loss to explain why we need to account for the sample spaces other than S1. Why do we need those spaces if I only care about S1 given A?

Cast it in terms of cause and effect. S1 and S2 are the possible causes of A. In that case, given that A has happened, you want to know the probability that S1 happened first. In order to do so, you need to consider the probability that A was caused by S2.

Alternatively, cast it as not involving S2 directly. The primary formula of interest, let us say, is P(S1 | A) = P(S1 & A)/P(A) = [P(A | S1) * P(S1)]/P(A), which does not “involve” anything other than S1 and A. If you have immediate knowledge of what P(A) is, great. Otherwise, you have to analyze that in turn; it is at this point, you can explain to the student, that one might consider S2, which is certainly just as potentially relevant to P(A) as S1 is. That is, one only considers S2 in order to help one establish the value of P(A).

But probably the best thing to do is actually to introduce some examples…

Next you’ll be wanting some examples, won’t you? The classic one is “There is a test for a disease that has an X% probability of a false positive (it will show the subject has the disease when he actually does not), and a Y% probability of a false negative (it will show the subject does not have the disease when he actually does). If the test gives positive results for Z% of the population, what percentage of the population actually does have the disease?” That forces you to consider both S1 and S2.

More generally, a Bayesian approach makes you include everything relevant you know in an analysis, not just what you can derive from a sample of it.