What are the odds? [math probability related to uber ratings]

As an Uber driver, my passengers can rate me from 1-5, 5 being highest. In more than a thousand rated rides, 83 percent have given me a 5, and rest have rated me 4 or lower.

Uber calculates my average by my last 500 rides, so each new rating I get knocks out one old rating. My current rating is 4.78. My score has been increasing over the last 3 months, but I don’t have data to say anything more precise.

At this point i figure my score won’t rise uless my new rating is a 5 AND the old rating that gets knocked out is 4 or less.

Assuming I continue at 83 percent 5s:

  1. What are the odds that my next new score will increase my average?
  2. Can it be determined how many new ratings it will take to reach an average of 4.87?

Unless my rudimentary math skills have failed me, 4.78/5 is greater than 83/100, so your current 1-5 rating is higher than the percentage (83%), meaning your rating should start dropping. If you’re consistently getting 83% 5’s, it should eventually drop to 4.15. You’d have to get 95.6% 5’s in order to raise your rating to 4.87.

I may be missing something…but…if your score is between 4 and 5, then any 5 will increase your score, and any 4 will decrease it. That’s how running averages work.

If a 4 or a 5 are equally likely, then the odds are 50/50.

Experimentally, if a 5 has happened 83 per cent of the time, and that has become the expected distribution, then there’s an 83 per cent chance the next score will be a five and thus increase your score by a small margin.

Either you’re making this too complicated, or I’m missing some detail.

To me, the complicating thing is the oldest score dropping out of the calculation, which I don’t know how to account for. Also i can’t see the. Individual scores, so I don’t know what I just scored or what the dropping away score is.

The only certainty is that my score will only increase if the new score is a 5 and the old score is lower than that.

No, your score should increase if the next rating is a 4 and the oldest one is lower than that, etc. for 3 and 2.

Assuming 83% gave you 5, and the remaining gave you 4 (let’s assume no 3 or lower scores), your long-time average rating would be 0.83 * 5 + 0.17 * 4 = 4.83. Since your current score is less, I would expect your score to increase steadily to 4.83 and level off there. Unless there are fluctuations, your score wouldn’t reach 4.87 (and even if it did, it would eventually fall to 4.83).

Sent from my Nexus 5X using Tapatalk

See if this makes sense.


P(5 stars OR <5 stars) = P(5) + P(<5 stars) + P(5 AND <5 stars). 

The rightmost term is 0 because you can’t get both 5 stars and <5 stars on your next trip.

The probability of both sides of the equals sign is 1.00 or 100%.

With 500 rides and 17% <5 stars, that works out to a weighted average of 3.705882 since


Weighted Ave of < 5 stars = (500*4.78-.83*500*5)/(.17*500)

This is the weighted average of all of the <5 stars (1’s, 2’s, 3’s, and 4’s).

This 3.705882 weighted avg is why Ethilrist is correct. The probability that your next score will increase your average is the same as the probability of getting a 4 or 5 on your next ride, which is 83% for a 5 and some unknown probability for a 4 conditioned on knocking out a 1, 2, or 3. A score of <4 will decrease your score and a 5 star ride will increase your score, and a 4 can hurt or help depending on what it knocks out.

So the probability that you’re looking for is P(5) + P(4 | knocking out a 1, 2, 3). P(5) is 83%. We can get to this point:


3.705882 =
4 * P(4) + (Wt Avg of <4) * P(<4) = 
4 * P(4 | knocking out a 1, 2, 3) + 4 * P(4 | knocking out a 4, 5) + (Wt Avg of <4) * P(<4)


But that’s as far as we can go because we don’t have any of the probabilities above.

Best you can say under these assumptions is that the probability for 1) is larger than 83%.

With respect to 2), assuming that the 83% is fixed forever, Teach-me-not is correct - the maximum you can get is 4.83. The goal of 4.87 is not probable unless you get a run of 5 stars with probability >> 83%.

PS: SDMB statisticians, check my work.

Having thought more about this, the 83 percent is a red herring. It says nothing about my most recent 500 scores.

My average score is 4.78. The total possible points are 2500. So my point total ranges from 2388-2392, given how my average may have been rounded up or down.

I have a minimum of 388 5-point scores, which would mean the remaining 112 scores would all be 4-pointers. That’s 77 percent of my scores being 5 points.

I have a maximum of 473 5-point scores if the remaining 27 scores are all 1-pointers. That’s 95 percent of my scores being 5 points.

My goal is to get to 4.87 average. What percentage of my future scores must be 5 points to continue increasing my average, and at that percentage how many more scores will it take me to get there?

Your ratings would indicate that ignoring your ratings of 5 there is an even spread of 1s, 2s, 3s and 4s in your other scores. This seems to be pretty typical in online evaluations. Your average will fluctuate each time by - .008 to +.008 according to the difference between the score that drops off and the new score added on.

According to Uber the average driver score is about 4.7 so you are fine.

Let’s restate the question again as i keep misstating it. If all future scores are 5 or 4, what percentage of 5s do i need to continue increasing the score and reach 4.87?

The reason I ask is that Uber sends out 4 metrics compare me to the “top partners” in my market. In 3 of the 4, I am well ahead of their scores. This is the only one where I’m behind. May as well be the best.

Yes, you are quite right. It’s a summary statistic that masks the underlying distribution of stars and when they happened. Unfortunately it’s the best we have with the information at hand.
[ul]
[li]**CASE A: **It’s possible that someone with your exact score could have started with ~100 rides with 1’s and 2’s and then worked their way up to all 5’s for the last 83% of rides. [/li]
[li]**CASE B: **Conversely someone with your exact score could have started with ~400 perfect 5 rides and then become complacent and most recently had a string of 1’s and 2’s.[/li][/ul]

With the information that we have, we can’t distinguish between any of these cases.

This is correct and puts bounds on worst and best cases, but I think don’t ask’s point is that it’s more likely that your scores <5 are a mixture of all star ratings over time. Sometimes online they are uniformly distributed across 1, 2, 3, 4 and sometimes they taper with P(4) > P(3) > P(2) > P(1).

Unless we have some sort of explicit set of probabilities for the distribution of star points and when they happened (Case A or B or something in between), I don’t think we can solve this without making some additional assumptions.

[ul]
[li]For Case A and P(5) being close to 100%, getting to 4.87 would happen relatively quickly with a string of 5’s. Those 5’s would replace the old 1’s and 2’s.[/li]
[li]For Case B and P(5) being close to 100%, it would require a very long string of 5’s because the old 5’s would get dropped first, and then the moving average has to work through the recent 1’s and 2’s, until those get dropped and replaced by 5’s.[/li][/ul]

I think you’re looking for an explicit solution to an equation similar to this:


500 rides * 4.87 = 500 * (5 * P(5) + 4 * P(4) + Weighted Ave of < 5 stars * P(<4))

You’d like to specify that P(<4) = 0 and you want to solve for 500 * P(5) = number of 5 star rides.

In other words:


500 * 4.87 = 2500 * P(5) + 2000 * P(4)

Unfortunately that takes us back to not knowing


P(4 | knocking out a 1, 2, 3) or P(4 | knocking out a 4, 5)

or


P(5 | knocking out a <5) or P(5 | knocking out a 5)

For something like Case A, knocking out a 1, 2, 3 would be a relatively large probability, and in Case B, knocking out a 1, 2, 3 would be a relatively low probability.

If you want to stipulate any of these probabilities, like P(5) = 99% and that “all of my low scores happened in the first 150 rides out of 500”, then I think you may be able to get closer to an expected number of rides. In other words, with more information, you could derive an estimated number of rides +/- some variability around the estimate. But it appears to me like there are more unknowns than equations, so to my eyes there is no explicit solution without making more assumptions.

The 83% isn’t a red herring. It tells you that you can expect 415 of your last 500 scores to be 5s. They total 2,075 points (4155.00). Your total points are 2,390 (5004.78). So the other 85 scores are “robbing” you of the 110 points that would give you an average score of 5.00.

Lets say 80% of the non 5s are 4s. The 69 4s will add another 276 to the total, making it 2,351. Say 80% of the rest are 3s, then the 12 of them will add 36 more and the remaining 3 are malicious 1s.

However that is all you can improve. Make the 15 1s, 2s and 3s into 4s and 5s although they are spread far apart and it will take a while. And realize that even if all your scores are 4s and 5s you need 435 out of every 500 scores to be 5s to average 4.87.

Seems like a guaranteed source of misery. You can only drop 65 points in any run of 500 fares or be below 4.87. So a handful of bad scores requires you to make a lot of 4s into 5s.

Quick question – does Uber say that they weigh the last 500 reviews evenly? If it were me, I would want to use a moving average that effectively weighs the most recent reviews more heavily. That way, if you are generally a good driver but one day you fall off the wagon and start picking up passengers drunk, your rating starts to drop precipitously. If Uber uses a moving average, any new rating less than five may move your score down more sharply than an old 4 star rating dropping off the range will increase it.

Uber’s user agreement just says that it’s based on “an average” of 500 rides. https://www.uber.com/legal/deactivation-policy/us/

It doesn’t say whether that average is an arithmetic mean, a weighted average, a moving average, or something else. It’s probably an arithmetic mean but I’m not certain.

Neither am I.

But it would be very bad for these ratings to change too quickly. As far as Uber is concerned, 4 stars is a failing grade, even on a scale of 5. There are a lot of documented cases of drivers being eliminated when their rating drops to 4.6 or lower. If riders think 3 is average and 4 is above average, and they all rate you at 4, you end up getting tossed. It’s kind of crazy that on a 5 point scale, 4.87 is outstanding and 4.60 is failure.

Comments:

  • Any 5 will increase your underlying score but may not be enough to bump you from 4.78 to 4.79 — the Uber software presumably rounds to the nearest 100th.
  • In OP you write “4 or less.” It’s probably the “or less” that keeps you from having 4.83 now.

Now, you’ve omitted the “or less.” With this change, if your 5-rate is 87% you should gradually approach 4.87 exactly. (If you want to get there significantly earlier than after 500 more rides, you’ll need a higher rate — 96% should get you to 4.87 after 250 rides, and your rating will then continue to rise to 4.96.)

~ ~ ~ ~ ~ ~ ~ ~
[Off-topic]
The averaging technique described is called moving-window average. In my algorithms I use the (superior :stuck_out_tongue: ) exponentially weighted average:
AVG[sub]new[/sub] = X + 0.995 * (AVG[sub]old[/sub] - X)
Here 0.995 is an arbitrary parameter (similar to the 500 parameter in the moving window average). X is the character of the present event — X=4 or 5 in your example.

The exponentially weighted average is superior to moving-window average for two reasons:

  • No need to remember the last 500 events in order. As you see, the new average is determined just from the present average and the present event.
  • Instead of the step function where the long-past X[sub]-500[/sub] is given just as much weight as today’s X[sub]-1[/sub] event, but then suddenly loses all weight, the weights of past events gradually decrease as the past recedes.

Despite its inferiorities, the moving-window average is in much more common use in computer algorithms. :eek:

To achieve a 4.87 average in 500 scores, I have to scare a total of at least 2433 points out of 2500, and at most 2437 points – again, depending on rounding up or down to the average.

This gives me a minimum of 433 5-star ratings + 67 4-star ratings, and a maximum of 484 5-star ratings + 16 1-star ratings.

I’m inclined to believe that most of the ratings I have that aren’t 5-stars are 4 stars, which suggests my number of current 5-star ratings is close to the calculated minimum of 388.

So my best estimate is that I have to increase my 5-star scores from 388-433, an increase of 45 5-star scores. And virtually all my other scores must be 4-stars.

After a disappointing (but not egregious) message exchange with my bank representative, I rated the encounter 7 on a scale of 10. I then got a phone call from a supervisor who spent 20 minutes addressing my concerns.

I wonder if they only make such calls for a 6 or 7 score. If I’d ranked the encounter 1 or 2 they’d assume I was an asshole and ignore me? :rolleyes:

Really, I have no idea what point you’re trying to make with this comment. What I was attempting to say is that what amount to retention/firing decisions are based on scores by passengers who have no education as to what their score really means in the process. It would be a reasonable assumption to make that grading someone as 4 on a 5 point scale would be an intent to rate them as above average. But instead, the consequence of many passengers making that assumption is the driver’s dismissal.

This illustrates the problem with “surveys” done by businesses for customer satisfaction. In the car dealership industry, it is well-known that any score less than 10/10 or “perfect” is a failing score. So, when you get a survey about your car purchasing experience or your most recent repair, anything less than all top scores is considered a failure on the part of the sales or service department. It’s utterly ridiculous that a 4/5 or 8/10 is considered to be a bad rating. So, if I waited a little longer than I thought I should have when I got my oil changed, and gave one 8 within a litter of 10s, the service department might not get bonuses. Or if my Uber driver takes a different route than I would have and I give a 4 instead of a 5, that means failure? Stupid systems where honest appraisals are viewed as negative.

Sorry, I was composing my next post while you posted this. What I’m trying to figure out here is how many rides it will take me to go from my current average to my goal. I don’t know to come up with a reasonable number for this.