This is a problem I just made up off the top of my head. Just to eliminate any ambiguity, the point of the problem is dealing with issues that discrete vs continuous probability can present – specifically when you have a distribution that you want to deal with a continuous space where the actual selections are bound to act a lot like a discrete one.
So let’s say we’re making a model of the predicted response of a person when you ask them for a number between 1 and 10. Now, a lot of people might pick a culturally lucky number like 7, a few will just pick 5 or 2 just because, but a lot of math geeks with pick pi the square root of 2, or e, just to be cute – so we’re dealing with a continuous space. Yet, basically nobody will pick 1.2937542268907651423, or 8/7, even though both are valid selections.
So we’ve run into a bit of a snag. The chances of somebody picking 1, or 7, or pi are relatively high. Definitely non-zero if you were to give a poll. Not around 1 or 7 or pi, not in the range of 1 and 1.00000001. Exactly 1. the chance of somebody picking 1.00000001 is basically nothing (as you’d expect with a continuous space).
Yet, we’re dealing with a continues distribution, so the chance of anybody picking any single number, whether it be 1 or 2.379225677778923495821345 should be the same: zero. You can only talk about ranges. Yet, I can almost guarantee that if you polled a large enough number of people you will not see this reflected.
Now, you could say that this reflects the belief people have that the question is only asking about integers, or whatever, but the reasons for picking those numbers don’t matter – you’re just trying to make a prediction about what people will pick (let’s pretend we’re doing some good ol’ Bayesian inference, if it matters to you). Some arbitrary non-whole number is a valid response, and if you poll enough people I’m certain some asshole will pick one just to be funny (hell, I’ve done it before). So you have a problem: a continuous distribution that is almost certainly going to act a hell of a lot like a discrete distribution for some numbers, but like a continuous one for others.
So is there a way to deal with this? I assume there has to be, I’m sure I’m not the first person to notice this problem (if it’s even a problem). Yet, the only answer I’m coming up with is to treat it exactly like a discrete distribution and throw out statistically insignificant responses like 5.678955463, which I don’t like. Maybe project the responses into some feature space where the common responses fit some standard distribution? I’m not sure how you’ll come up with a function for that, though.