Statistics: is there a kind of regression that does this?

Napier · August 30, 2017, 7:37pm

There’s a kind of regression I think should exist but I don’t know what it’s called. Or perhaps there’s actually just a way of accomplishing this with a general linear model, and I simply don’t see how. Or maybe what I’m picturing wouldn’t work.

Imagine a big population of things that have a variety of properties. But imagine further that they tend to derive from a small population of certain prototypical things with these same properties in certain values. I want to try to understand each member of the big population as a mix of the prototypes in the small population.

Here’s an example. Suppose the things are people with skills, and the skills are taught in university courses. But we don’t think of the people as having had this course, that course, et cetera. We think of them as having a degree in this field or that field. Well, all the people with a degree in a certain field, say electrical engineering, will tend to have had many similar courses, but not exactly. Still, you’ll find they tend to have a skill set different from people with a degree in nursing.

What I want to do is, first, apply the skill measurement tests to all the electrical engineers, and all the nurses, and all the other graduates on a field by field basis, so I have an average skill profile for each field. Then, I want to look at each person’s skill results, and say this person is best fit by adding together this much electrical engineer, this much nurse, and so forth.

Or, another example, I have baked goods made of flour, sugar, egg, cinnamon, salt, yeast, et cetera. I want to analyze all these and average together all the instances of donut, and all of the cake, and all of the bread, and all of the crackers, and these become my prototypes. Then when I am confronted with an instance of cookie, I can say it’s midway between cake and crackers.

Does this sound like a special kind of regression that has a name? Or is all of this just muddled somehow, or obviously a common kind of regression?

Thanks!

Sage_Rat · August 30, 2017, 7:51pm

Napier · August 30, 2017, 8:52pm

Thanks, Sage, but that identifies groups. I want to attribute an individual’s blend of properties to contributions from each group.

Look at faces. There’s eye separation, size of eyebrows, prominence of chin, etc etc. I want to look at my Smith ancestors and my Jones ancestors, and wind up attributing 63% of my face to the Smiths and 37% to the Joneses. That’s the method I want.

I think cluster analysis would sort my family into a cluster of Smiths and a cluster of Joneses, and perhaps more probably put me with the Smiths.

Iridescent_Orb · August 30, 2017, 9:47pm

This may help:

DPRK · August 30, 2017, 10:00pm

Perhaps you have not asked a statistical question? If you average some points to produce a well-defined Smith point, and some other points to get a Jones point, and so on, you just end up with a well-defined set of points. Then, for any point in their convex hull, you can represent it as a linear combination of extreme points, with the coefficients summing to 100%.

Andy_L · August 31, 2017, 1:46am

How about this Naive Bayes classifier - Wikipedia

If I’m reading it right, this method assigns each object a probability that it belongs to a cluster, and produces a set of “typical values” for each cluster.

DPRK · August 31, 2017, 3:21am

Pearson’s principal component analysis may be relevant if you are analysing a data set. This is related to factor analysis.

septimus · August 31, 2017, 8:35am

An ordinary linear regression should meet your needs. You perform the regression separately for each target. Simply submit N k-tuples to your ordinary regressor, where N is the number of properties and k-1 is the number of prototypes. For example, if sugar is one of the properties, the corresponding k-tuple would be
(Donut[sub]sugar[/sub], Cake[sub]sugar[/sub], Cracker[sub]sugar[/sub], … Target[sub]sugar[/sub])

Principle component analysis, as some suggested, is the way to go if you want to construct the best (most informative) prototypes from the data, rather than constraining them initially to your choices ({Donut, Cake, Cracker, …}).

In either case, you may want to assign weights to the different properties, e.g. if salt is a more important property than sugar.

Snarky_Kong · August 31, 2017, 11:49am

Mixture models.

Topic		Replies	Views
Factor Analysis or Cluster Analysis? Factual Questions	1	633	January 14, 2003
Statistics (Functions) Factual Questions	18	1252	October 25, 2000
Statistics question--blatant request for homework help Factual Questions	13	1159	May 7, 2001
"Regression toward the mean." What implications re: intelligence, athleticism? Factual Questions	8	3640	February 23, 2005
Statical Relation Between Likes? Factual Questions	3	800	November 6, 2006

Statistics: is there a kind of regression that does this?

Related topics