Statistics: is there a kind of regression that does this?

There’s a kind of regression I think should exist but I don’t know what it’s called. Or perhaps there’s actually just a way of accomplishing this with a general linear model, and I simply don’t see how. Or maybe what I’m picturing wouldn’t work.

Imagine a big population of things that have a variety of properties. But imagine further that they tend to derive from a small population of certain prototypical things with these same properties in certain values. I want to try to understand each member of the big population as a mix of the prototypes in the small population.

Here’s an example. Suppose the things are people with skills, and the skills are taught in university courses. But we don’t think of the people as having had this course, that course, et cetera. We think of them as having a degree in this field or that field. Well, all the people with a degree in a certain field, say electrical engineering, will tend to have had many similar courses, but not exactly. Still, you’ll find they tend to have a skill set different from people with a degree in nursing.

What I want to do is, first, apply the skill measurement tests to all the electrical engineers, and all the nurses, and all the other graduates on a field by field basis, so I have an average skill profile for each field. Then, I want to look at each person’s skill results, and say this person is best fit by adding together this much electrical engineer, this much nurse, and so forth.

Or, another example, I have baked goods made of flour, sugar, egg, cinnamon, salt, yeast, et cetera. I want to analyze all these and average together all the instances of donut, and all of the cake, and all of the bread, and all of the crackers, and these become my prototypes. Then when I am confronted with an instance of cookie, I can say it’s midway between cake and crackers.

Does this sound like a special kind of regression that has a name? Or is all of this just muddled somehow, or obviously a common kind of regression?


Thanks, Sage, but that identifies groups. I want to attribute an individual’s blend of properties to contributions from each group.

Look at faces. There’s eye separation, size of eyebrows, prominence of chin, etc etc. I want to look at my Smith ancestors and my Jones ancestors, and wind up attributing 63% of my face to the Smiths and 37% to the Joneses. That’s the method I want.

I think cluster analysis would sort my family into a cluster of Smiths and a cluster of Joneses, and perhaps more probably put me with the Smiths.

This may help:

Perhaps you have not asked a statistical question? If you average some points to produce a well-defined Smith point, and some other points to get a Jones point, and so on, you just end up with a well-defined set of points. Then, for any point in their convex hull, you can represent it as a linear combination of extreme points, with the coefficients summing to 100%.

How about this Naive Bayes classifier - Wikipedia

If I’m reading it right, this method assigns each object a probability that it belongs to a cluster, and produces a set of “typical values” for each cluster.

Pearson’s principal component analysis may be relevant if you are analysing a data set. This is related to factor analysis.

An ordinary linear regression should meet your needs. You perform the regression separately for each target. Simply submit N k-tuples to your ordinary regressor, where N is the number of properties and k-1 is the number of prototypes. For example, if sugar is one of the properties, the corresponding k-tuple would be
(Donut[sub]sugar[/sub], Cake[sub]sugar[/sub], Cracker[sub]sugar[/sub], … Target[sub]sugar[/sub])

Principle component analysis, as some suggested, is the way to go if you want to construct the best (most informative) prototypes from the data, rather than constraining them initially to your choices ({Donut, Cake, Cracker, …}).

In either case, you may want to assign weights to the different properties, e.g. if salt is a more important property than sugar.

Mixture models.