Analyzing a probability problem

I have what seems like a simple problem to state, but it’s beyond what I know of probability to figure out how to analyze it. So I’m asking for some help. I’m not looking for answers here, maybe just some direction as to what type of problem this is or how to look at it (of course I won’t object to any or all solutions).

It’s essentially a ‘waiting until success’ problem with the addition that the probability of success varies in some known fashion. I’ll state it in the form of an abstract example:

Every day Jack drives his car to the train station and rides the train to work. Let’s say there’s no parking lot, so Jack must park on a nearby street. He drives down the street searching for a parking spot and will park at the first parking spot he finds. Assume that the probability of any existing parking spot is p(x), where x is the distance from the station. Because other people take the train, p(x) changes (likely increases) the further Jack gets from the station. What is the expected distance that Jack will drive if p(x) is the same every morning?

Here’s a few cases I’m thinking of, and terminology:

I can set up a discrete statement - if, for example the probability of a spot on each ‘block’ i is p[sub]i[/sub] and the prob. of no spot on a block is q[sub]i[/sub]. If we further imagine that at some point the prob. of a spot = 1 (or define this event to be that Jack gives up or runs out of gas), we can generate a probability distribution P(x) for parking as:

P(1) = p[sub]1[/sub]
P(2) = q[sub]1[/sub]p[sub]2[/sub]
P(3) = q[sub]1[/sub]q[sub]2[/sub]p[sub]3[/sub]




                n-1 
P(n) = p[sub]n[/sub] *     | |   (q[sub]i[/sub])
                | |
               i = 1


(that’s a big pi there)
My problem is in finding the expectation for a long series easily and in generalizing to a continuous p(x) (say, linear).

I can see that I could work on this discretely as a multinomial distribution (with each ‘block’ having its own variable), but I’m hoping there’s something simpler given a simple function for p(x). Note I’m only interested in expectations really, not in finding the distribution fn.

P(n) is a decreasing function of n. This is contrary to your assumption that P increases as Jack gets further from the train station.

Oh and how did you get the subscripts to work?

I’m not sure if I’m interpreting you right, but couldn’t you do something like integrate xp(x) to find the expected value?

Oh, and kesagiri, the subscripts:

p[sub]1[/sub]

is done by typing

p{sub}1{/sub},

(replace the { and } with [ and ]). {sup} works for superscripts, too.

kesagiri :
It’s not very clear now that I look at it, but p(x) and P(x) represent two different things.

p(x) is the probability of an open spot at any location x. It increases as x increases (and is assumed to reach 1 at some point). For further discussion, I’ll define p(x) to be strictly linearly increasing from zero until some cutoff C, at which point p(x) equals 1.

P(x) is the actual probability distribution function of Jack’s ‘driving distance’. It does, indeed, decrease with distance. Maybe it should be called ‘Pdf(x)’.

In other words, there’s more likely to be an open spot further from the station (that’s p(x)), but the first open spot (the one Jack parks in) is more likely to be closer to the station (that’s P(x)/Pdf(x)).

Also, if you’re writing a lot of subscripts, it helps to cut and paste [su[sub][/sub]b]i[/[sub][/sub]sub] and fill it in as needed. (and if you ever want to know how someone did something, try hitting ‘quote’ and you should get their exact text, then just don’t post. That’s how I learned the above trick for typing actual brackets.)
cabbage, that sort of makes sense, but I’m not quite getting it yet – I’ll think about or if you could elaborate, I might see it.

panama[sub]jack[/sub]

Okay I suddenly realize what you were saying cabbage, and it probably goes to the confusion over P(x) and p(x).

The expectation will be the integral of x*Pdf(x), but my problem is finding Pdf(x) given only p(x) (which is not a distribution, since it’s integral over infinity is not 1.)

So maybe it’s not as much probability as working with product series.

Lets stick with the discrete case for the moment. If the probability of finding a parking space inside one block is p[sub]1[/sub] and the second block p[sub]2[/sub] then isn’t the probability of finding a parking space inside the first two blocks actually

p[sub]1[/sub]+p[sub]2[/sub]

and not

p[sub]1[/sub]*(1-p[sub]2[/sub])?

Given that this is correct the the probability mass function is

p[sub]1[/sub]+p[sub]2[/sub]+…p[sub]2[/sub],

no?

Given this I am not sure how you came up with the multiplicative p.m.f. you did.

That is shouldn’t the probability be higher for Jack to find parking in blocks 2 and 3 than 1 and 2? The way you have it set up I think the answer would be the opposite. Your p.m.f. is decreasing with each block thus the

P(1) + P(2) > P(2) + P(3).

You know not being able to edit your own post really really sucks.

Previously I wrote.
Lets stick with the discrete case for the moment. If the probability of finding a parking space inside one block is p[sub]1[/sub] and the second block p[sub]2[/sub] then isn’t the probability of finding a parking space inside the first two blocks actually

p[sub]1[/sub] + p[sub]2[/sub]

and not

p[sub]1[/sub] + p[sub]1[/sub]*(1-p[sub]2[/sub])? (my error was here)

Given that this is correct the the probability mass function is

p[sub]1[/sub]+p[sub]2[/sub]+…p[sub]2[/sub],

no?

Given this I am not sure how you came up with the multiplicative p.m.f. you did.

That is shouldn’t the probability be higher for Jack to find parking in blocks 2 and 3 than 1 and 2? The way you have it set up I think the answer would be the opposite. Your p.m.f. is decreasing with each block thus the

P(1) + P(2) > P(2) + P(3).

Why can’t we edit our own posts?

Seems dumb to me…or is there something special I have to do? A secret code…special handshake?
This:
p[sub]1[/sub]+p[sub]2[/sub]+…p[sub]2[/sub],

should read as
p[sub]1[/sub]+p[sub]2[/sub]+…p[sub]n[/sub],

kesagiri, the topic of post-editing has come up many times in ATMB. Suffice to say for now, that there’s good reason for it. Meanwhile, if you really need to fix something, you can e-mail manhattan or an administrator and ask them to edit it for you.

The probability can’t just be added… For instance, suppose that there’s a 75% chance of finding a spot on block 1, and also a 75% for finding one on block 2. Adding would then give you a 150% chance of finding one in the first two blocks, whereas in actuality, there’s still a chance (1 in 16) that you won’t.

Take the natural log of the probabilities. Now instead of multiplying the probabilities, you’re adding them, and you can use integration to generalize to the continuous case. Once you’re done, raise e to the answer, and voila!

Chronos,

If in the first block the probability is say p[sub]1[/sub]=10% and in the second p[sub]2[/sub]=20%, then given the p.m.f the probability of finding a parking space in either the first or second block is what?

18%

30%

Or something else?

I think the problem is not well specified.

Further for the discrete case the p.m.f is defined as follows:

f(x) = p(x[sub]i[/sub]) if x=x[sub]i[/sub]

and zero otherwise.

Where f is the p.m.f. Hence, I think, the definition above is not accurate.

Further think of this. P(n) is being defined as the p.m.f. using the multiplicative definition above. Thus, the probability of find a parking space in the nth block is lower than finding a parking spot in the n-1 block, no?

By the way, in the case of discrete probabilities the sum of the probabilities for the individual events do have to sum to one. Supose there are only two events, then the probability of each happening cannot be .75.

I think you need to redefine your problem.

Ok, I’ll do it.

Let p(x) = the probability of the closest space being in block x.

x can either be all integers from 1 to infinity or be confined to a range of numbers. (e.g. there is always a space within 10 blocks) The sum of all values p(x) must equal 1. The average distance driven will be 1p(1) + 2p(2) + 3p(3) + 4p(4) + … + max*p(max). Where max is the highest allowable value of x or possibly infinity.

Example:

Let p(1) = .1 (There is a 10% chance that the closest space will be in block 1)

Let p(2) = .2 (There is a 20% chance that the closest space will be in exactly the second block. This is not the probability that there will be an empty space in the first two blocks)

Let p(3) = .5 (You get the picture)

Let p(4) = .3 (There is always a space within 4 blocks)

Average distance driven will be:

1 * .1 + 2 * .2 + 3 * .5 + 4 * .3

or 3.2 blocks.

Actually, it’s 28%. More on this below, but
it’s (p[sub]1[/sub] + q[sub]1[/sub]p[sub]2[/sub] = 10% + 18%)

I’ll note that herein when I say “found” I mean “found & parked in”.

It appears that you’re having some of the same problems I did when first looking at this problem (so that even if it seems simple to state, it can be difficult to work on).
The major thing I can see is that you’ve confused the probability distribution of actual found parking spots Pdf(x) with the function of existing parking spots p(x). You’re also bringing in the cumulative distribution function (herafter c.d.f) which is muddying things up as well.

I’ll make an attempt to make it clearer. First let me say that the p.d.f is not ‘defined’ in the sense of given, it is the thing I’m trying to calculate (thanks to cabbage’s addition, the expectation follows once you know the pdf). What is in fact given is the likelihood of a parking spot existing at any given point along the street. This is p(x).

Then there is a random variable which represents the actual distance driven to a parking spot (or alternately, the first parking spot along the street). It is the distribution of this variable that is being sought … so on Day One Jack might drive 1 block, Day Two he’ll have to drive 3, Day Three maybe less or more, etc.

p(x) lets you know how likely an open spot is to be on each block. So it can take on any value from 0 to 1 for any block (although I’ve now defined it more specifically, it could be defined in any way you like). If you wanted to compute p(x) statistically, you’d go up and down the entire length of the street every day (without parking) and measure the ratio of filled to potential spots on each block.

Then, in the discrete case, the probability of the event that a spot is found on the first block is p[sub]1[/sub]. The probability of the event that a spot is found on the second block is (prob. that an open spot does not exist on the first block * prob. that an open spot does exist on the second block = q[sub]1[/sub]*p[sub]2[/sub]).

So a spot is only taken on a later block if there are no open spots on any previous blocks, which is why the event requires the failure of all events up to that point to occur.

Now, a bit on the p.d.f vs. c.d.f : The probability density function (or distribution) or a random variable represents the probability that that random variable equals a certain value. Over its entire range, it will sum to 1, meaning that all possible values that it can take on are covered. For this pdf(x) as described in my OP, it means the probability that a spot is found at exactly x.

The c.d.f. is what you’re talking about when you ask for the probability of found spots being within a certain range 0 to x (or in some number of blocks). It is a running sum of the probabilities of actual found spots up to that point, and approaches 1 asymptotically.

So in your question, the c.d.f. is used which is
Pdf(1) + Pdf(2) and is the likelihood of a spot found in the first two blocks.
While the c.d.f. has its uses, I’m not sure if the expectation can readily be found from it, or if it’s any easier to calculate than the p.d.f.

So, therefore, since I’m trying to find the distribution to get to the expectation, the c.d.f. is ignored to focus on what will give me an answer.

Thank you kindly, Lance, but you’ve redefined my problem in such a way as to take as a given what I’m now trying to solve for.

It is indeed a much easier problem.

But it’s not what I was asking. You’ve redefined it so that the p(x) is now the distribution function of found parking spaces (since the closest spot represents the found spot). I don’t take that as given, I take something else as given and want to find the distribution from that. That is the problem, and I’m staying with it.

I think cabbage was basically right. Let’s define terms clearly. Let

f(x) = the probability of an empty space existing at distance x from the starting point (i.e. from the train station). The probability that there is no space at this distance is 1-f(x).

int[sub]0->x[/sub][1-f(x)]dx = the probability of having reached distance x without having encountered an empty spot. (“int” is my typewritten integral sign symbol)

p(x) = the probability that Jack encounters his FIRST empty spot at distance x.

Clearly, p(x) is the probability of finding a spot at x times the probability that we HAVEN’T found a spot at some closer distance. So

p(x)= f(x)*(int[sub]0->x[/sub][1-f(x)]dx).

So now we weight each distance x with the probability of finding the first empty spot at that particular distance, and integrate to get the average. This is the expected distance Jack has to drive from the station to find an empty parking spot.

<x> = int[sub]0->infinity[/sub][x*p(x)]dx

where p(x) is given above.

Panamjack,

Thanks for the more lengthy explanation of the problem.

Okay, first. Your p.d.f. is sufficient for you to calculate the expectation.

In the discrete case the expectation is calculated as follows.

P(1) + 2P(2) + 3P(3) + … + n*P(n)

For the case where after n blocks there is always a space.

For the inifinite case let n go to infinity.

For the continuous case you’d need to provide a continuous version of the discrete p.d.f. you have in your opening post. Then simply integrate x*f(x) where f(x) is the p.d.f.

Here is an example for the discrete case

0.2 0.2 1 0.20
0.4 0.32 2 0.64
0.6 0.288 3 0.86
0.8 0.1536 4 0.61
1 0.0384 5 0.19
2.51

The mean is 2.51 (the sum of the fourth column). The first column are the p[sub]i[/sub]. The second column is the probability of driving 1, 2, etc. blocks to find a parking space. The third is the number of blocks. The fourth colum is n*P(n).

Whoops, sorry pj, you posted while I was typing. Does what I wrote answer your question? I’m not sure now, having read your post.

Yuck! I was wondering if that would come through and I guess the answer is no. For the example the columns are

0.2=====0.2=====1=====0.20
0.4=====0.32====2=====0.64
0.6=====0.288===3=====0.86
0.8=====0.1536==4=====0.61
1.0=====0.0384==5=====0.19
======================2.51

Where the === represent spaces between the columns.

I hope I get this right…

Isn’t this actually 1-int[sub]0->x[/sub][f(x)]dx?

int[sub]0->x[/sub][f(x)]dx will give us the probability of finding a parking space between zero and x no?

That is F(x) = int[sub]0->x[/sub][f(x)]dx

So 1-F(x) is the probabiltity of not finding a parking space between 0 and x.

Okay, I think things are going better now.

APB9999, thank you, you’ve got it in a much clearer form, although there is just one error in the math. The probability of not finding a spot is not the sum (or integral) but the product of all previous values. So if you just replace the integral with a product sign, you get a clear statement of the problem and I think that works well.

kesagiri, the problem then boils down to finding exactly what that continuous p.d.f. is. I was just having trouble generalizing from the discrete case, since (until The Ryan’s hint) I couldn’t get an integral out of it like I wanted.

BTW, the board software strips out extra spaces. you can force them to stay using [co[sub][/sub]de] [/co[sub][/sub]de] around it but I don’t know how to make it not double-spaced.



 this preserves         spaces
 so that               columns
 line                  up
 but    it's           double-spaced.