That person doesn’t know what they’re talking about. Mode will also underestimate and overestimate for most people. Same with median, mean, or any other measure.
I’m unclear on why you’d use mode for a continuous variable too. If you have someone that reports 2 lbs of loss and someone that reports 2.01 lbs, then those are different and don’t count for mode calculation.
Sounds to me like someone didn’t hit their goal and just needs something to complain about.
You’re right, I don’t know what I’m talking about. Which is why I’m asking the question, so I can learn more. Thanks for living up to your username, though. That was very helpful.
Plenty of similar studies use the mode, I don’t recall a weight loss study doing so, but they probably do. As mentioned above neither is all that useful and the measurement selected is generally the one that makes the best point, and that point may or may not be misleading in itself. No one number is going to describe the set of data points well.
True, and there are probably more complex statistical tools to try to account for outliers.
But when I’m trying to decide on an intervention for weight loss, whether for myself or when suggesting things to other people, what I want to know is how much weight most people lost using that intervention. Is that unusual?
As a practical matter, the statistical tools using the mean are a lot more widely taught and understood. (FWIW I’m the resident “guy who knows stats” in a bio lab, and I had to go look up whether there was a good way to construct a confidence interval for a median.) Tests for differences between means, construction of confidence intervals of the mean, etc. are all taught in intro stats classes, and easily understood by physicians.
And while the mean can be misleading whenever there are distributions skewed towards one extreme, that can be handled with appropriate data transformations and statistical techniques. Those techniques might have been used in the studies used by the meta-analyses, but the precise details will be lost from study to meta-analysis to abstract to press release…
Part of the problem is a tendency to always assume that any distribution is going to be a Gaussian, for which mean, median and mode are the same. A lot of people have no idea what to do when they have a skewed distribution, or even when measuring something which is going to be skewed by its very nature (think impurities, where the ideal value for the acceptable range is one of that range’s limits). Pet peeve of mine with regards to both medical practices and process-improvement techniques.
ETA: heck, a family of questions which keeps cropping up and which shows how little people understand skewed distributions is all those about life expectancy before modern medicine. “It’s skewed by the amount of infant deaths, stillbirths and women dead the first time they had a baby”… nowadays the biggest CoD is being alive, but not so long ago birth itself was pretty nasty.
The geometric mean (= back transformed mean of the log-transformed data) might actually be better than the mean, since weight is one of those things I would expect to be distributed lognormally.
Weight is continuous rather than discrete, but that doesn’t in itself make it impossible to use the mode. (You could convert a continuous variable into a discrete one by sorting it into bins, e.g. rounding off each person’s weight to the nearest pound). The bigger issue is, I think, that most ‘basic’ parametric statistical tests that the typical researcher would know are based on the mean. I guess you could use nonparametric approaches to compare modes, though.
Honestly for this purpose, the confidence interval (e.g. "95% of people lost between 10% and 20% of their body weight) is probably more useful than either the mean or the mode.
a) Lost an amount of weight they themselves felt made an impact on their quality of life.
and
b) Still experienced this improvement in quality of life after, say, 2 years or so.
I’m guessing the baseline is 5%, same as for giving up cigarettes or alcohol (number stolen from Penn and Tellers “Bullshit”, feel free to insert better numbers if you have them on hand), so anything above that would be interesting.
I am not an expert but I have processed a bunch of data in my time…
All the summary statistics: mean, mode, median, etc. are basically shorthand ways of describing the distribution of the data. And the conventional interpretations of these statistics assume a Gaussian distribution. A given person may not know what they are talking about, but a mean is a perfectly valid calculation regardless of the distribution. And in the majority of cases, the summary statistics provide a good approximation of the data set. Certainly it is what people expect. It depends on the distribution and as the number of data points increases, all distributions of continuous numbers tend toward a Gaussian. For small data sets, this assumption can easily break down.
Really the proper result to report when reporting on a data set is the distribution. Everything else is an approximation. People don’t like going to the trouble of presenting a picture when a number is expected, but the distribution is always going to provide the best description of the data. Then the proper interpretation of the summary statistics, which also should be presented, will be easy.
If and only if the data follow a normal distribution, mean is the best way to describe the data. If the data don’t follow a normal distribution, investigators usually report the median and the IQR distribution. They don’t usually report the mode for human studies on body weight because it doesn’t make any sense. Body weight only follows a linear distribution.
The mode is completely useless. If two people lost 50kg, and two other people lost 1kg and 2kg, then the mode is 50kg. Its worthless. What you want is the median.
Are you referring to the central limit theorem? Apologies if you’re not. If you are, this is not quite what it says. The central limit theorem refers to the sampling distribution not to the actual ‘true’ distribution of the data in reality, and it makes a statement about the distribution of your estimates over time, not about the actual distribution of the variable. It tells you that if you take enough samples, the sampling distribution will eventually resemble a normal and your estimate of the mean will converge on the true mean. This is true in most cases (not all), even in many cases when the actual distribution of the data is not normal.
I didn’t get it the first time they explained it in stats class either: I don’t think this one is often explained very well.
Plus, not ALL distributions of continuous numbers tend toward a Gaussian.
I already gave an example in my previous post, and it’s one that’s very common one in chemistry, pharma and alimentary industries: impurities or other things we could call “contamination”.
Acceptable values: 0 to n.
Most common value (mode): if your process is any good, 0.
Mean and median: somewhat above 0, due to the occcasional positive result (that is, result above 0).
I think the discussion above conclusively proves that reducing a large set of data points to a single number is a stupid idea. Just give us the original data points.
I’ve read a fair share of research and don’t recall modes ever being presented as the primary measure. Is there a discipline or subdiscipline where this is common?