How does Shazam work?

*Shazam *is a software service that listens to a few seconds of a song, and then tells you the title, artist, album, and of course gives you the option to purchase it. But how the heck does it work?

I’m guessing there is some sort of Fast Fourier Transform involved, but I don’t really know. Their website gives nothing away. Anyone have the straight dope? Thanks!..TRM

Their website is pretty tight, and the company is privately held, so there isn’t a requirement for public disclosure of other useful information. However the terms and conditions contain the key pointer. The patents. U.S. Patent Nos. 6,990,453, 7,346,512, 7,627,477, 7,739,062.

6,990,453 is System and methods for recognizing sound and music signals in high noise and distortion.

And so it goes.

They are not going to tell you which of the methods are used, and how much of any method is used in the production system. But you can get a pretty good idea. Have fun reading.

Some info can be found here and here. As I understand it, they do a short time Fourier transform and pick some peaks as signatures.

Thanks, fellas! I knew the Dope would sort it out…TRM

What the heck is a short time Fourier transform, and where is the relevant XKCD?

ETA

Wow. I was kidding, but: Fourier

Obviously, it works using the wisdom of Solomon, the strength of Hercules, the stamina of Atlas, the power of Zeus, the courage of Achilles, and the speed of Mercury.

Honestly, I don’t think it takes a computer program to figure out whether you’re playing Queen or not.

Isis what you did there.

A wizard did it.

Or Gomer Pyle.

See the mouseover text in this one
The big F in this one means Fourier transform

Not one, but three XKCD links? Awesome.
ETA: I still don’t understand what it is. The wiki page is beyond my abilities to follow. It looks like a graph of sound, but how is it different from what appears on the spectrum analyzer of an equalizer (or is it the same thing?).

It is comparable to a spectrum analyzer, at least in theory. There are actually a number of different variations, but the basic idea is transforming a time-domain function (like the sound-wave of a song) into a frequency-domain function.

Another way of thinking of it is breaking a complex repetitive signal into into its constituent frequencies.

A constant A note would transform into a spike at 440 Hz. Two notes would have two spikes. More complicated waveforms would produce more complicated transforms.

Shazam appears to take a time-slice of the song (say, a second or less) and do an FFT and isolate the peaks (basically the points in time where a dominant frequency exists). They map these points into a “identifying code” of sorts, and attempt to find a song that has these codes in the right relative times… if that makes any sense at all. If they get enough matches in a short enough time interval they declare a hit.

One of the articles linked above has a very nice description of the process, as well as some useful images.

NOTE: When talking about taking the Fourier transform of a sampled, finite-time function (like a time-sample of a song), they are actually using a Discrete Fourier Transform (DFT). These days, Fast Fourier Transform (FFT) is basically used interchangeably with DFT.

Your spectrum analyzer likely uses FFT chips to analyze the input.

Thanks.

Or Selena for grace, Hippolyta for strength, Athena for skill, Zephyrus for fleetness (and flight), Aphrodite for beauty, and Minerva for wisdom.

If you’re a girl.