Comparing Internet Reviews More Accurately with Mathematics

Moonrise · March 16, 2025, 10:58am

Nowadays, it’s become routine for all of us to compare google/amazon/tripadvisor/whatever reviews before making a purchase or choosing a restaurant/hotel.

Usually, these reviews are on a scale from one star to five, with the average of all reviews determining the score. However it is well-known that you should also check the overall number of reviews submitted as it might be safer to opt for a product with an average of “only” 4.7 but 200 reviews over another with a 5-star score but only one review.

That’s when it gets tricky. If I compare two products with scores of, say, 4.9 (156 reviews) and 4.7 (387 reviews), how can I decide which one is the best, or at least the best-rated? If I simply multiply the averages by the number of reviews, I get a number that gives me an idea (the bigger, the better), but which is also completely out of proportion with the original grading system. Is there a mathematical formula that can bring this unwieldy number to something of the same order of magnitude as the 1-5 star system, so that I can manipulate it with values that are easier to grasp?

IANAMathematician, so bear with me, but it is my understanding that in statistics the standard deviation, being the square root of the variance, is used partly for that reason: it brings the values back to the same order of magnitude as the original data. Is this correct? And if yes, could I use something similar for internet reviews?

ThelmaLou · March 16, 2025, 11:45am

There’s

https://reviewmeta.com/

for Amazon. Not sure it’s still live.

ReviewMeta analyzes Amazon product reviews and filters out reviews that our algorithm detects may be unnatural.

Cervaise · March 16, 2025, 12:13pm

It’s also not just the number of reviews and the average, it’s also the distribution. I’d be more inclined to choose a lower rated place if it has an unnatural split of 5-star and 1-star ratings that looks like the result of malicious review-bombing.

It’s not an easy analysis. It’s as much gut feeling as objectivity.

John_DiFool · March 16, 2025, 1:24pm

Fakespot does basically the same thing, with a letter grade correlating to the “fakeness” of the reviews for a given product.

Francis_Vaughan · March 16, 2025, 1:56pm

The core idea of a statistical metric is that you are hoping to compare the two scores, not just the average scores, but the shape of the distribution of the scores. In doing this you are testing the proposition that the two scores are the same within a certain chance that the proposition is true just by luck.
The problem you have is that you don’t have actual knowledge of the properties of the products being scored. By which I mean that you don’t know what score either would get as an intrinsic property of the product, nor the shape of the distribution they would intrinsically attract. What you have are two sets of estimates of these parameters. A very low number of scores from which estimates of the average and spread are created leads to a weakening of the applicability of the test. In principle, if you only have one score, the estimate of the spread isn’t even defined. So you are quite limited in what can be done. But this is a well understood question.
However once you have say past 100 reviews you should have a quite reasonable estimate, and a very standard can be test applied.

Standard deviation is the usual estimate of the spread. But it presupposes the nature of the intrinsic distribution. With reviews this is a big problem when you get a pile of bad reviews and good reviews, and the distribution is bimodal. The you have a whole new set of issues to manage.

Napier · March 16, 2025, 2:19pm

There’s another problematic issue in Amazon reviews that could lend itself to mathematical analysis, based on additional data that could be gathered automatically. It came to light when my daughter bought me a pocket telescope for my birthday.

The telescope was terrible! Views were blurry, hazy, and distorted.

Curious, I looked its reviews up. It had five 5 star reviews and one 1 star review. So, for each 5 star review, I looked up the reviewer’s other reviews. In each case, the reviewer had dozens or hundreds of other reviews, all of them 5 star, and for products that fit no obvious pattern except that I didn’t know any of the manufacturers. There’d be fingernail polish, socket wrenches, book repair tape, mineral supplements, dog collars, all manner of unrelated products, reviewed as 5 star by each reviewer.
The reviewer who gave 1 star had maybe 10 or so other reviews, with a range of stars, and most of the items were somewhat recreational.

At least some of this analysis was quantitative, so I can imagine an automated shenanigans detector.

By the way, the problem of taking averages and numbers of reviews into account together is addressed by Student’s T statistic, though this does assume Gaussian distribution.

Czarcasm · March 16, 2025, 2:19pm

Does your formula take lying, both amateur and professional, into account?

Thudlow_Boink · March 16, 2025, 2:51pm

It does, and it ensures that the standard deviation is in the same units (centimeters or hours or “stars” or whatever) as the original data and the mean.

There are ways to statistically compare two different data sets that allow you to compare them. But if one of them is more trustworthy than the other due to things like being more likely to get flooded with fake reviews, or having different sorts of people with different criteria doing the ratings, I don’t see how you could establish that purely with mathematics.

LSLGuy · March 16, 2025, 3:41pm

My take on reviews is they are pure Garbage In, so any attempt at analysis will be pure Garbage Out. I certainly agree with the legit statistical issues brought up by @Francis_Vaughan and the cheating issues mentioned by many people but especially @Napier.

But wait, there’s more!

Reviews are what we call a “self-selected poll”. Even if we could magically strip out all the cheaters, the roster of who makes a review is not a random cross-section of the actual customer experience. As a general matter, most people are much more inclined to give bad reviews than good. So if they buy a crappy product they are motivated by anger / disappointment to write a bad review. But when they buy a good product, they don’t bother to review it. But the size of this effect is not quantifiable.

Here’s one more factor:

Reply · March 16, 2025, 3:46pm

Putting aside the question of fake or irrelevant reviews for a second, assuming you had two trustworthy and comparable datasets but one with a higher # of reviews, I think the OP’s original question is still meaningful: How do you combine both the star rating and # of reviews into a unified rating that takes both into account, sort of like a “weighted average”?

ChatGPT mentions two techniques, a Bayesian Average and a Wilson Score. But my math isn’t good enough to evaluate whether these are appropriate. Any thoughts on those, or similar measures like them specifically for this purpose?

Chronos · March 16, 2025, 4:05pm

For the actual underlying problem, you really want to read a selection of the bad reviews and the good reviews. Google, at least, automatically gives you a selection of “typical” reviews of each type, as well as an AI summary of them. If the typical bad review is based on things irrelevant to you, or because the product fails to break the laws of physics, or something, that means the product is probably good. If the bad reviews are “doesn’t actually work”, though, that’s different.

Or, you can get a situation like I’ve seen on reviews for Indian restaurants: The reviews are all either “I’d never eaten Indian food before, but this was pretty good”, or “Of all of the many Indian restaurants I’ve eaten at, this was the worst”.

LSLGuy · March 16, 2025, 4:20pm

Which immediately raises the question of what algorithm they use to select that sample. I’m not suggesting malign intent on their part, but whatever logic they use may be the sort of thinking that we and the OP can use too.

Speaking to my personal habits …
On the rare occasions I’m comparison shopping by reviews, my approach is to first read only the bad ones. If those people are evidently idiots, or are bitching about e.g. customer service issues, not the product, I can throw them out of my informal mental sample space.

Then I turn the the good reviews and try to throw out the hagiography. Usually about then I give up on reviews altogether and make my decision solely on the stated features of the product versus the price.

Chronos · March 16, 2025, 4:55pm

Well, they’ve been doing it for a lot longer than GPT-style LLM AIs have been around, so it’s nothing that complicated. I think that it mostly just looks for keywords and their synonyms in the reviews. They might also have a “Was this review helpful?” checkbox, which is simple enough to interpret.

Stranger_On_A_Train · March 16, 2025, 5:59pm

It is even worse than that; there are known biases and inequities in reviews. For one, reviewers have differing expectations about the quality and functionality of a product, and often, someone will leave a poor review just because the product didn’t meet their specific needs even though it is obvious from the description that they were in error. Another is that people generally aren’t inspired to post middling reviews; on most products, the majority of reviews are a combination of 1, 4, and 5 stars, resulting in the noted bimodal distribution. And sometimes people will leave a poor review for an otherwise excellent product because they were unhappy with the speed of delivery or customer service (which drives me nuts because that shouldn’t be part of the product review if it is being provided by an independent distributor). So Amazon reviews, and in general most online reviews by customers are an inconsistent and unreliable body of data to even perform statistics, especially if the number of reviews is only a few dozen or less. This is even before you get to the the issues of intentional or malicious manipulation of the review system.

Stranger

Saint_Cad · March 16, 2025, 6:05pm

Or dumb-ass reviews.
“Great item but 1 star since it arrived late.”
“1 star. I wanted the blue one but accidently ordered the green one and they didn’t send me the blue one.”
“Haven’t received it yet, but my friend that has a similar item from a different company and he didn’t like his. One star”

I’ve seen ones where it seems like the reviewer refuses to ever give a five star review.
“Greatest item I’ve ever bought. Does everything it promises. Cured my wife’s stage three cancer. Four stars.”

ASL_v2.0 · March 16, 2025, 6:22pm

Really? I thought there was an entire industry devoted to fabricating positive reviews. For that reason, I skip past the five star reviews and go straight to the lower-star reviews that weren’t bought and paid for to see how appropriate they are. If they hint to singular bad experiences that don’t really reflect on the product or someone who is clearly just an idiot (for example, someone who doesn’t understand that noise-cancelling headphones won’t block out the sound of someone crumpling paper next to them, or someone who complains about speed of delivery) then I take that as a good sign. But if they’re valid complaints about the quality of construction, for example, then I take heed.

Dr.Strangelove · March 17, 2025, 3:32am

There are also cultural factors. I can’t speak to how true it is, but I’ve heard that in Japan, you want to shoot for about 3.5-star restaurant reviews. The 5-star places are review-bombed by tourists with no taste. But for the locals, 3 or 4 stars means the place is excellent.

I guess there is the problem that if it is a tourist trap with 3.5 stars, it’s probably awful.

Moonrise · March 17, 2025, 11:35am

Indeed.

Many of you have raised excellent points which show how untractable the underlying complications can get. Thanks for drawing my attention to these issues which I hadn’t really considered.

To rephrase, my question is “is there a mathematical formula that could help make the scores more legible, assuming - again - that the reviews are unbiased, thus leaving aside malice, stupidity and misunderstanding ?”

Chronos · March 17, 2025, 11:41am

For such a mathematical formula, you would first need some model for what you would expect ratings to be for a random product. For instance, you might assume that a random product would be equally likely to actually be 1, 2, 3, 4, or 5 stars.

One simple way you could deal with the uncertainty, then, would be to add some number of average “phantom ratings” to every product. I’m unsure of the proper number of “phantom ratings” to add, but let’s suppose, for simplicity, that it’s 1.

In this case, a product with only one rating, at 5 stars, would effectively have two, the 5 and a “phantom 3”, for an expected true rating of 4. On the product with a thousand ratings at an average of 4.5, however, the phantom 3 would have almost no effect on the true rating, and so with the phantom rating, it’d still be 4.499 or so. Thus, the product with the 4.5 is probably better.

pulykamell · March 17, 2025, 11:57am

I don’t know if it’s true in Japan, either, but I do that kind of thing here in the US. Usually, I find my sweet spot is about 4.2 ±0.1 stars for restaurant reviews, particularly ethnic ones. Once you dip into the 3.5/s, though, it can get a bit dicey.

In regards to 1 star vs 5 star reviews, I just look for the overall shape of the distribution. When I see a pretty typical declining progression from 5 to 1, that’s a pretty normal shape and I trust the score for the most part. If there’s two big spikes at 5 and 1, then I’m wary of the score, though it takes reading some of the reviews to see if it’s being manipulated in either direction. What may be overlooked is how many 2s there are. Review bombers are going to go straight for the 1, but if I see a blip in the 2 ratings in addition to the 1s, I tend to assume the ratings are probably honest ones.

Topic		Replies	Views
Do you trust 5 star reviews? In My Humble Opinion	53	3825	December 12, 2016
Amazon shoppers: Ratio of good to bad reviews. In My Humble Opinion	24	4501	February 18, 2014
Bimodal customer product-review distributions Factual Questions	27	1209	November 15, 2018
Are Internet Ratings Broken? Miscellaneous and Personal Stuff I Must Share	37	1202	September 28, 2023
People who don't understand feedback at Amazon In My Humble Opinion	47	5145	April 7, 2017

Comparing Internet Reviews More Accurately with Mathematics

Related topics