Rating likes and dislikes - math question

solkoe · February 2, 2009, 10:21pm

I have twenty images which I have randomly numbered 1-20. They are random images that might correspond to a persons likes and dislikes such a horse, an ipod, a person sleeping, food etc.
I would like to develop a system whereby a person could choose 5 of their likes and 5 of their dislikes and compare those to other individuals to see which pair of people have the most similarities. Think of it as a dating survey.
Could I come up with a single number to express their likes and a single number for their dislikes that could be used for comparison.

Chronos · February 2, 2009, 10:27pm

There are plenty of ways you could do so, but they wouldn’t be particularly useful, unless you were only interested in one aspect of personality. A better method would be to store the full set of likes and dislikes for each person, and then boil it down to a single number when you’re comparing two people-- Such a number, for instance, might be the number of items they agree on minus the number they take the opposite view on. For even more detail, you might let the users rate the various objects, instead of just approve-disapprove: A person might, for instance, very much like horses, but only mildly like the ipod, and mildly dislike sleep.

nivlac · February 2, 2009, 10:30pm

What’s wrong with just using the number of exact matches? 5 = perfect match in likes (or dislikes), 0 = no matches in likes (or dislikes). Since you’re offering random images, there’s no such thing as “closeness” in the images.

solkoe · February 2, 2009, 10:36pm

Could I allow them to string the images from 1 to 20 where 1 is most liked and is most disliked. If I have each image numbered and each box numbered, could I subtract the image number from the box number and add up the differences to get a single number?

Santo_Rugger · February 2, 2009, 11:03pm

I think all that would do would be to show how close to “average” each person is; it wouldn’t help you match people. It especially wouldn’t work on the tail ends, two people might be completely different, yet one liked 1, 3, 5, 7, and 9, while the other like the even numbers. Wouldn’t their score be identical?

scm1001 · February 3, 2009, 3:54am

of course it is easy to come up with a number, though comparing numbers is less straightforward than you would like. E.g Take just six pictures with 2 likes and two dislikes. A dislike is 0 , a neutral is 1 and a like is 2. So 102201 would mean neutral picture 1 , dislike picture 2 ect. It is trivial for a computer to compare any two numbers to come up with a similarity match, though a human would have trouble at first glance comparing 1002021102020111102 with 2011002021111000111022

Jragon · February 3, 2009, 4:06am

What I’d do is store each one in a “slot” and compare them with that slot and each adjacent slot (or maybe two on each side, depending). Then add one to an arbitrary “compatibility” value if they’re similar, or two if they’re in the same place.

For example (I’m going to use shorthand here):

compatAB = 0 (the compatibility rating between A and B, starts at 0 for obvious reasons)

personA[3] = horse
personB [4] = horse
4 - 3 = 1 (within range)
∴ compatAB + 1

personA[2] = iPod
personB[2] = iPod
2 - 2 =0 (same spot)
∴ compatAB + 2
(so up to here computAB is 3)

personA[1] = food
personB[20] = food
20 -1 = 19 (on opposite sides of the chart!)
∴ compatAb - 1 (or two if you want things that extreme)

personA[4] = computers
personB[7] = computers
7 -4 = 3 (meh, not a disagreement, but not exactly a perfect match)
∴ compatAB +/- 0

The good thing about this method is if it’s not working well it allows you to easily expand or contract your windows in which things add or subtract values making it really easy to adapt if it’s not working well. This method, in hindsight, is pretty much an elaborated version of what Chronos suggested. If you really want to get out there you could group “related” things too, i.e. computers and tinkering with gadgets might be related, but I can’t really elaborate unless you tell us how exact you want it.

Jragon · February 3, 2009, 4:15am

If you want to consolidate this to a single number, neutral vs anything you wouldn’t add anything, like vs like add 1, like vs dislike, subtract 1, dislike vs dislike add 1, and maybe neutral vs neutral add… one half? This is a nice method too and easily converts into a single number. In fact, a computer would probably use this method to do its “trivial comparison” anyway, and then filter compatibility and incompatibility based on the range the numbers are in, no need to ever show the human the weird string of like/dislike fields.

Sage_Rat · February 3, 2009, 4:38am

To compare the closeness of colors, like RGB, people generally consider this to be a 3D coordinate representation. I.e. instead of having X, Y, and Z axises you have R, G, and B. Then you simply use the distance between any two points to determine their closeness.

I believe that expanding the number of dimensions preserves the rule for calculation the distance between points:

2D
dist = sqrt( x_diff^2 + y_diff^2 )

3D
dist = sqrt( x_diff^2 + y_diff^2 + z_diff^2 )

4D
dist = sqrt( x_diff^2 + y_diff^2 + z_diff^2 + w_diff^2 )

etc.

Note that I say “I believe”. I’m not absolutely sure that the method stays the same.

But so yeah, you’ll have to divide your single number back up into the 20 positional coordinates and run some math over it, but it will be a single identifier.

Jragon · February 3, 2009, 4:57am

Sage_Rat:

To compare the closeness of colors, like RGB, people generally consider this to be a 3D coordinate representation. I.e. instead of having X, Y, and Z axises you have R, G, and B. Then you simply use the distance between any two points to determine their closeness.

I believe that expanding the number of dimensions preserves the rule for calculation the distance between points:

2D
dist = sqrt( x_diff^2 + y_diff^2 )

3D
dist = sqrt( x_diff^2 + y_diff^2 + z_diff^2 )

4D
dist = sqrt( x_diff^2 + y_diff^2 + z_diff^2 + w_diff^2 )

etc.

Note that I say “I believe”. I’m not absolutely sure that the method stays the same.

But so yeah, you’ll have to divide your single number back up into the 20 positional coordinates and run some math over it, but it will be a single identifier.

Yeah, it’s something like that. But the problem is it gets really clunky (and hard to conceptualize without some logical reduction) when you get fields larger than say, six elements to compare.

I suppose it’s probably the best method if you have time to set it up (well, it admittedly wouldn’t be that hard with a couple nested for loops or recursion or something, but let’s not go there, this seems to actually be a math/logic question, not one of programming), but with something this big and depending on how exact this needs to be it may be unnecessary.

ultrafilter · February 3, 2009, 5:08am

If all you want to do is measure the distance between users, the Hamming distance seems like the way to go.

zut · February 3, 2009, 12:54pm

It sounds to me like you want to assign a particular number to each person (27 or 151 or whatever), and subtract two numbers to determine similarities. For example, a person with “27” would be more similar to a person with “20” than to a person with “40” (the dfferences being 7 and 13, respectively). Is that correct?

If that’s the case, I don’t believe there’s any way to accomplish that. With 20 random pictures, “liking” or “disliking” any picture should be independant of liking or disliking any other picture, so you really have a 20-dimension vector, which you can’t represent as a scalar.

You could represent the vector as a string of numbers like scm1001 suggests, but it’s still a vector, and treating it like a scalar with simple subtraction won’t work.

Now, assuming I’ve correctly interpreted your question, why do you want to come up with a single number to express likes and dislikes? Why would storing the “likes” and “dislikes” in vector format and doing vector subtraction (or whatever) be less desirable?

LSLGuy · February 3, 2009, 2:01pm

What the most recent posters said. Vector distance between points in 20-space is the only meaningful way to do it.

To make the points in 20 space have much predictive value, you really need to have the people rate each picture on a scale from, say, -5 (strong dislike) to 0 to +5 (strong like). Many people have a hard time with negative numbers, so you might get better results with a rating scale of 1-10, which you normalize to [-5, +5] before doing any calcs.

You might also want to use a non-linear distance function. e.g. instead of simply using compatibility score=squareroot(sum of square( dimensional distance N)), use squareroot(sum of square( weightingfunction(dimensional distance N))).

The weighting function could either apply extra weight to larger differences, or if you wanted to be really powerful, attempt to scale the relative importance of the various pictures. E.g. for a dating questionaire, people who react very differently to a picture of a baby are probably less compatible than people who react equally differently to a picture of an iPod.

If you don’t have experience in creating surveys, note that how you choose the pictures and the weights and any surrounding verbiage will almost totally determine the results. The survey takers are almost superfluous. Hence the surveys taken by politicians & talk show hosts showing that 9 out of 10 people agree with them.

If you are actually trying to gather useful data, not just engage in polemics, then you need to spend a lot of time & effort & expertise getting the biases out of your process (or more accurately, understanding what the biases are and applying accurate corrections).

Quercus · February 3, 2009, 2:35pm

It seems the simplest way of doing this is to just compare the first person with the second.
Now go through the pictures; for each picture they both like or both dislike, add one. (So you have a number from 0 to 10 for how close those people are). If you wanted, you could also subtract one for each picture that one person likes and the other dislikes, giving a number from -10 to 10.

Now compare the first person with the rest of the people, one at a time. Then the second person with everyone else, etc.

At the end, you’ll have a number for each pair of people. You can find the two most similar (might be ties), or the most similar to any given person, or whatever you want.

Quercus · February 3, 2009, 2:53pm

Or, if you wanted, when you’re going through the pictures, you could also subtract one for each picture that one person likes and the other dislikes, giving a number from -10 to 10 [assuming that the only data you have is ‘like/dislike/neutral’, this gives you the same ranking order that the 10-dimensional distance method does, but it’s much simpler].

Sage_Rat · February 3, 2009, 3:15pm

You should be able to do an equation like that in a database query. It will have to process all of the fields, but it will be simple to write.

Datamining might reveal similarities of data which would allow one to do a faster–though probably less accurate–comparison. But that will require a decent dataset first.

Chronos · February 3, 2009, 6:57pm

There are many different rules one can use, each of which will produce a space with slightly different properties. Such a rule is called a “metric”.

puddleglum · February 3, 2009, 8:50pm

You could produce a number but it would have very little meaning. The numbers on a likert scale do not have any set meaning relative to each other. A person who rates a picture as a 2 does not like it twice as much as a picture they rated 1. They liked the first picture more but that is all that can be said. How much more or less the values on a likert scale represent is unknowable. If you had each person rank the images according to how much they liked it, the numbers would be more comparable. Maybe doing matches based on a small likert scale could yield more useful data, but in general math with those kinds of numbers is meaningless.

solkoe · February 3, 2009, 11:02pm

Thanks everyone.
This was just a fun activity for my students. A “getting to know you” exercise. I thought it might be fun to compare the students.
I guess I will stick to +1 for similar likes and -1 for similar dislikes. There does not seem to be away to assign a single number for comparison.

ultrafilter · February 3, 2009, 11:51pm

The Hamming distance that I mentioned in post #11 is the standard measurement for what you’re trying to do. Use it.

Topic		Replies	Views
Several probabilities questions Factual Questions	7	1814	November 27, 2012
Can someone answer this IQ question for me? Factual Questions	73	6860	November 10, 2017
"If you like x, you'll like y" in computer speak In My Humble Opinion	5	792	March 7, 2006
How do you think the attractiveness of people is distributed? In My Humble Opinion	18	3940	August 27, 2014
A silly poll In My Humble Opinion	23	1593	January 17, 2010

Rating likes and dislikes - math question

Related topics