Algorithm for last.fm?

this is a site-specific question so i dunno how much anyone here can help, but i’ve asked it there and no one has an answer. i hope there’s enough statistical analysis involved to interest you people.

i’m trying to find out if anyone knows the algorithm or whatever system last.fm uses to determine their musical compatibility between you and a peer.

it gives you a bar meter that is variously colored based on the amount of compatible bands you and your friend share in common, then lists a few of said bands under the compatibility meter.

however, as far as i can tell, you cannot go through and see a *comprehensive *list of exactly matching bands.

if anyone knows how you might do that, that’s be cool. again, i asked the last.fm forum and got no answer. more than 100 views, no reply.

as for the algorithm used, i’d like to know if they match 1. over-all total matching bands or 2. if it’s based on the over-all percentage of matching bands vs non-matches or if 3. they leverage the amount of total listens to each matching band and calculate popularity into the equation.

let me give you a few examples. i have several friends who i know their musical taste. one has a nearly 100% match rate, even tho i listen to a significant amount of non-metal bands and he is totally metal.
now, i just started scrobbling my library, and i only got a portion of my library and listen-count up from my ipod before something malfunctioned and i lost all my data (sidenote: i’m currently having to rebuild my library because of this). the point is, my top ten bands are “based on most listens” and are all non-metal.

yet my metal-friend is my top match at about 99% ‘‘bar meter.’’ i wish i knew exactly the number of bands we sync on, but again, i have no idea how to tell what specific bands match.it just gives you that lame bar.

i have another friend who i know from real life is more honestly musically compatible, because she listens to the same mix of metal and nonmetal as me.

she, however, is a much much lesser rating at “high” with about 70% bar. the problem with her is she listens to just GOBS more music than i do, in general. the genres are all the same as me, and we have what i would suspect is an overall matching amount of total bands, but percentage wise, she simply listens to significantly more bands than i do (her over-all band count is 3800, mine is currently 800).

finally, a third friend has about a 90% bar rating for “super high” but she doesn’t listen to ANY metal at all, which again, i do a LOT of metal but not exclusively.

so my estimate with her would be a high rate amoung the non-metal bands but a 0% sync on many hundreds of metal bands. yet she’s still “super high” match, the highest possible rating.

so as i have illustrated, however they determine this is perplexing.

do any of you know much about such algorithms?

btw my second example, the friend who i am “most probably in real life musically compatible with,”–she was at first “highly” compatible, then i added more of my library, and she went up to “very highly,” then i added even more of my library, and she went back down to only “highly.”

this is perplexing, since example 3 has a calculated “higher” compatibility yet i know for a fat listens to way less of the same stuff as me.

i think no one knows this.

i wonder if it’s similar to how genius works.

http://www.technologyreview.com/blog/mimssbits/25267/

I don’t know what algorithm last.fm uses, and I suspect they may consider it a trade secret, but as someone who works with semantic similarity algorithms professionally, I can tell you how it is probably done on a general level: They create a large matrix where the columns represent the users and the rows represent the bands. Each cell is populated by the number of times the particular user listened to a track by the particular band. To find the musical compatibility between any two users, they compute the cosine between their two vectors, resulting in a number between -1 and 1, which can be interpreted as a scale from total incompatibility to total compatibility. There are lots of possible variations to this approach, such as reducing the dimensionality of the matrix using singular value decomposition; this makes it easier to compare users who don’t listen to a lot of the same bands. The same vector-space approach can be used to determine the similarity of any two bands (by computing the cosine of rows instead of columns), or the compatibility of users and bands (by comparing rows to columns).

If you are interested in learning more, read up on the vector-space model of information retrieval. If you have a basic programming, mathematical, or computer science background it should be very easy to grasp.

this kind of stuff fascinates me. the article on how itunes’ genius works is equally interesting.