Algorithms that have more (putatively) independent outputs than there are indpendent inputs

The question was inspired by a body fat scale, but could apply to other systems too. I’ve tried reading about degrees of freedom in statistics, but I don’t really understand and don’t know how it applies to this question.

As I understand it, my body fat scale uses bioelectrical impedance analysis to measure only two presumably independent variables: body impedance and body weight. (When I set it up, I had to enter other information like height and age but those hardly change from day to day). The scale outputs four values: fat weight, protein weight, water weight, and bone weight. I think these four values should be independent because at least in theory each one can vary while the others remain constant.

I imagine they probably measured the weight and impedance of a bunch of volunteers and also subjected them to more accurate body composition tests (hydrostatic weighing, air displacement plethysmography, dual-energy X-ray absorptiometry, etc). Then they used regression analysis to develop an algorithm that estimates the user’s fat weight, protein weight, bone weight, and water weight.

I have been using the scale for years but never put much stock in small variations in these outputs. I recognize and accept its limitations and realize it’s probably the best I can hope for from an inexpensive device.

So I don’t really object to it, I just wonder about the mathematics of it. Am I right to think the outputs (e.g., calculated fat weight) are lacking in independence that the underlying true values (e.g., actual fat weight) should have? Is there a way to way to quantify how inaccurate the values are likely to be (width of the error bars)? And is there a word to describe such an algorithms (I was thinking it might be something like “over-analysis”)?

ETA: I guess you can’t do regression analysis without correlation, and if two variables are correlated, they’re not really independent. So I honestly don’t know how many independent variables are involved in either the set of inputs or the set of outputs. It’s been so many years since I was last in a mathematics classroom that just thinking about this stuff makes my brains leak out my ears.

The multiple outputs can’t have more meaningful information than the inputs.

That said, there are perfectly reasonable algorithms that give more useful outputs than are given inputs. A random vector generator could have three scalar outputs given one seed input. A unit converter could give you length in many different units when given a length in one specific unit.

You might look up what overdetermined and underdetermined mean in this context. You are on to something; they can’t create new and significant information out of thin air.

The issue is with the thing you mentioned but aren’t exactly considering.

The inputs are NOT just your impedance and weight. They are your impedance, your weight, your height, your age, whatever else you glossed over during the OP setup, AND all the knowledge of human physiology embodied in those statistics you mentioned.

So here’s what you really have:

  1. Fat weight = function of (height, age, human statistics, impedance, weight).

  2. Protein weight = function of (height, age, human statistics, impedance, weight).

  3. Water weight = function of (height, age, human statistics, impedance, weight).

  4. Bone weight = function of (height, age, human statistics, impedance, weight).

There’s nothing underspecified about any of that. Or at least there doesn’t need to be from a mathematical perspective. To be sure, the scale could be crap, their statistics laughable, etc., which would result in GIGO errors.

But if you’re thinking “two values in, 4 values out” violates some inherent constraint of data science, the answer to that is “Nope.”

What it does mean (assuming the relationship isn’t some sort of weird fractal thing). is that there there are some combinations of outputs that never occur. Basically your set of possible outputs will be a 2 dimensional surface in a 4 dimensional space.

This sort of thing is quite common in scientific data analysis. Basically, you have to assume some sort of relationships and patterns (at least approximate ones) in your outputs. And then hope that your assumptions are accurate.

I think that was my major stumbling-block. I assumed that if a parameter doesn’t change, then it doesn’t matter. I should have known better from my time as a physics students, lo these many years gone by, when I used parameters like gravitational acceleration and spring constants all the time. They matter even if they don’t change.

Thanks all. I think my understanding of the issue is much improved now.