This is a problem I just made up off the top of my head. Just to eliminate any ambiguity, the point of the problem is dealing with issues that discrete vs continuous probability can present – specifically when you have a distribution that you want to deal with a continuous space where the actual selections are bound to act a lot like a discrete one.

So let’s say we’re making a model of the predicted response of a person when you ask them for a number between 1 and 10. Now, a lot of people might pick a culturally lucky number like 7, a few will just pick 5 or 2 just because, but a lot of math geeks with pick pi the square root of 2, or e, just to be cute – so we’re dealing with a continuous space. Yet, basically nobody will pick 1.2937542268907651423, or 8/7, even though both are valid selections.

So we’ve run into a bit of a snag. The chances of somebody picking 1, or 7, or pi are relatively high. Definitely non-zero if you were to give a poll. Not **around** 1 or 7 or pi, not in the range of 1 and 1.00000001. Exactly 1. the chance of somebody picking 1.00000001 is basically nothing (as you’d expect with a continuous space).

Yet, we’re dealing with a continues distribution, so the chance of anybody picking any single number, whether it be 1 or 2.379225677778923495821345 should be the same: zero. You can only talk about ranges. Yet, I can almost guarantee that if you polled a large enough number of people you will not see this reflected.

Now, you could say that this reflects the **belief** people have that the question is only asking about integers, or whatever, but the reasons for picking those numbers don’t matter – you’re just trying to make a prediction about what people will pick (let’s pretend we’re doing some good ol’ Bayesian inference, if it matters to you). Some arbitrary non-whole number is a valid response, and if you poll enough people I’m certain some asshole will pick one just to be funny (hell, I’ve done it before). So you have a problem: a continuous distribution that is almost certainly going to act a hell of a lot like a discrete distribution for some numbers, but like a continuous one for others.

So is there a way to deal with this? I assume there has to be, I’m sure I’m not the first person to notice this problem (if it’s even a problem). Yet, the only answer I’m coming up with is to treat it exactly like a discrete distribution and throw out statistically insignificant responses like 5.678955463, which I don’t like. Maybe project the responses into some feature space where the common responses fit some standard distribution? I’m not sure how you’ll come up with a function for that, though.