Algorithms that have more (putatively) independent outputs than there are indpendent inputs

bibliophage · February 9, 2023, 5:55pm

The question was inspired by a body fat scale, but could apply to other systems too. I’ve tried reading about degrees of freedom in statistics, but I don’t really understand and don’t know how it applies to this question.

As I understand it, my body fat scale uses bioelectrical impedance analysis to measure only two presumably independent variables: body impedance and body weight. (When I set it up, I had to enter other information like height and age but those hardly change from day to day). The scale outputs four values: fat weight, protein weight, water weight, and bone weight. I think these four values should be independent because at least in theory each one can vary while the others remain constant.

I imagine they probably measured the weight and impedance of a bunch of volunteers and also subjected them to more accurate body composition tests (hydrostatic weighing, air displacement plethysmography, dual-energy X-ray absorptiometry, etc). Then they used regression analysis to develop an algorithm that estimates the user’s fat weight, protein weight, bone weight, and water weight.

I have been using the scale for years but never put much stock in small variations in these outputs. I recognize and accept its limitations and realize it’s probably the best I can hope for from an inexpensive device.

So I don’t really object to it, I just wonder about the mathematics of it. Am I right to think the outputs (e.g., calculated fat weight) are lacking in independence that the underlying true values (e.g., actual fat weight) should have? Is there a way to way to quantify how inaccurate the values are likely to be (width of the error bars)? And is there a word to describe such an algorithms (I was thinking it might be something like “over-analysis”)?

ETA: I guess you can’t do regression analysis without correlation, and if two variables are correlated, they’re not really independent. So I honestly don’t know how many independent variables are involved in either the set of inputs or the set of outputs. It’s been so many years since I was last in a mathematics classroom that just thinking about this stuff makes my brains leak out my ears.

Napier · February 9, 2023, 6:59pm

The multiple outputs can’t have more meaningful information than the inputs.

That said, there are perfectly reasonable algorithms that give more useful outputs than are given inputs. A random vector generator could have three scalar outputs given one seed input. A unit converter could give you length in many different units when given a length in one specific unit.

You might look up what overdetermined and underdetermined mean in this context. You are on to something; they can’t create new and significant information out of thin air.

LSLGuy · February 9, 2023, 7:32pm

The issue is with the thing you mentioned but aren’t exactly considering.

The inputs are NOT just your impedance and weight. They are your impedance, your weight, your height, your age, whatever else you glossed over during the OP setup, AND all the knowledge of human physiology embodied in those statistics you mentioned.

So here’s what you really have:

Fat weight = function of (height, age, human statistics, impedance, weight).
Protein weight = function of (height, age, human statistics, impedance, weight).
Water weight = function of (height, age, human statistics, impedance, weight).
Bone weight = function of (height, age, human statistics, impedance, weight).

There’s nothing underspecified about any of that. Or at least there doesn’t need to be from a mathematical perspective. To be sure, the scale could be crap, their statistics laughable, etc., which would result in GIGO errors.

But if you’re thinking “two values in, 4 values out” violates some inherent constraint of data science, the answer to that is “Nope.”

Buck_Godot · February 9, 2023, 8:36pm

What it does mean (assuming the relationship isn’t some sort of weird fractal thing). is that there there are some combinations of outputs that never occur. Basically your set of possible outputs will be a 2 dimensional surface in a 4 dimensional space.

Chronos · February 10, 2023, 12:15am

This sort of thing is quite common in scientific data analysis. Basically, you have to assume some sort of relationships and patterns (at least approximate ones) in your outputs. And then hope that your assumptions are accurate.

bibliophage · February 10, 2023, 8:21pm

I think that was my major stumbling-block. I assumed that if a parameter doesn’t change, then it doesn’t matter. I should have known better from my time as a physics students, lo these many years gone by, when I used parameters like gravitational acceleration and spring constants all the time. They matter even if they don’t change.

Thanks all. I think my understanding of the issue is much improved now.

Topic		Replies	Views
"Body Fat" measurements in scales Factual Questions	6	1370	August 1, 2008
Weight of individual body parts. Factual Questions	3	7098	March 11, 2005
Any science behind this: What is the best way to measure body fat? Factual Questions	7	880	June 27, 2002
Scale that measures body fat Factual Questions	11	981	May 12, 2003
Measuring body fat percentage Factual Questions	15	1075	October 7, 2004

Algorithms that have more (putatively) independent outputs than there are indpendent inputs

Related topics