*Shazam *is a software service that listens to a few seconds of a song, and then tells you the title, artist, album, and of course gives you the option to purchase it. But how the heck does it work?
I’m guessing there is some sort of Fast Fourier Transform involved, but I don’t really know. Their website gives nothing away. Anyone have the straight dope? Thanks!..TRM
Their website is pretty tight, and the company is privately held, so there isn’t a requirement for public disclosure of other useful information. However the terms and conditions contain the key pointer. The patents. U.S. Patent Nos. 6,990,453, 7,346,512, 7,627,477, 7,739,062.
6,990,453 is System and methods for recognizing sound and music signals in high noise and distortion.
And so it goes.
They are not going to tell you which of the methods are used, and how much of any method is used in the production system. But you can get a pretty good idea. Have fun reading.
Obviously, it works using the wisdom of Solomon, the strength of Hercules, the stamina of Atlas, the power of Zeus, the courage of Achilles, and the speed of Mercury.
Not one, but three XKCD links? Awesome.
ETA: I still don’t understand what it is. The wiki page is beyond my abilities to follow. It looks like a graph of sound, but how is it different from what appears on the spectrum analyzer of an equalizer (or is it the same thing?).
It is comparable to a spectrum analyzer, at least in theory. There are actually a number of different variations, but the basic idea is transforming a time-domain function (like the sound-wave of a song) into a frequency-domain function.
Another way of thinking of it is breaking a complex repetitive signal into into its constituent frequencies.
A constant A note would transform into a spike at 440 Hz. Two notes would have two spikes. More complicated waveforms would produce more complicated transforms.
Shazam appears to take a time-slice of the song (say, a second or less) and do an FFT and isolate the peaks (basically the points in time where a dominant frequency exists). They map these points into a “identifying code” of sorts, and attempt to find a song that has these codes in the right relative times… if that makes any sense at all. If they get enough matches in a short enough time interval they declare a hit.
One of the articles linked above has a very nice description of the process, as well as some useful images.
NOTE: When talking about taking the Fourier transform of a sampled, finite-time function (like a time-sample of a song), they are actually using a Discrete Fourier Transform (DFT). These days, Fast Fourier Transform (FFT) is basically used interchangeably with DFT.
Your spectrum analyzer likely uses FFT chips to analyze the input.