Active noise cancelation: Canceling songs by IDing them automatically?

There are a bunch of noise-canceling headphones out there. I think they work by listening to ambient noise with a built-in mic and then playing the inverse waveform of that noise in real-time, synced to cancel each other out. It works best on repetitive noises (refrigerator hum, power stations, etc.) rather than unpredictable things like human speech.

However, could broadcast music be an exception? Would it be possible to combine the now-ubiquitous song ID tech (where your phone can hear a few seconds of a song and tell you what it is) with noise cancelation, such that as soon as the headphones identify a song, it’ll play the inverse of that song time-synced and cancel it out? Has that been tried?

That wouldn’t really work in the real world. By the time the music reaches the ear, the waveform has almost nothing to do with the actual recording anymore, due to mostly chaotic room acoustics - reflections, reverb, etc. Algorithms like Shazam just look for very recognizable details, like an acoustic fingerprint, that’s why it works in noisy environments.

For this to work, the song would have to be playing at the rate that the noise canceller expects, to a very high degree of accuracy. By the end of the song, say after 5 minutes, the canceller and the music source would have to still be in sync to a quarter duration of a sound wave or better, something like 50 microseconds. Most music sources don’t control playback speed with that degree of accuracy, so the canceller wouldn’t be able to match the source, unless it were constantly listening and adjusting playback speed. I don’t know if that’s even technologically feasible – it’s certainly different than what Shazam does in just identifying a song.

Syncing isn’t really the biggest issue here. For this to work, the sound wave has to match exactly - in all frequencies, up to 20kHz. That’s not going to happen in an anechoic chamber with the mic 2 inches from a (perfect) speaker, much less in a real environment with an imperfect sound source and a mobile listener.

I don’t understand this… aren’t recordings (CDs, streaming) pretty similar from one playback to the next? If there is a small mismatch by a few tenths of a % each time (is there?) couldn’t the system speed up or slow down the song incrementally, monitoring the real-time waveforms, until the noise is minimized? (Aside: Is that sort of how AEDs work, syncing by monitoring? Not sure.)

It doesn’t have to be perfect. The technology already works well enough for noise that it doesn’t even know about beforehand (airplane cabins, machinery hum, etc.) just by tinkering with it in real time. With the additional advantage of having a a matched recording of song, it would know the spectral signature of the song already and just have to adjust to the room conditions – as it already does.

To be clear (in case it wasn’t before), I was thinking that the headphones would stream/download the song itself as soon as it’s IDed, then invert it and tinker with it to make it match more and more perfectly. Sure, if you move around the room a bit or the speakers are particularly bad, it wouldn’t be a perfect cancelation… but could it still be worthwhile enough to be effective? Even halving the amplitude would be a big deal, no? Or am I misunderstanding how destructive interference works?

The one reason that noise canceling works as well as it does is that the mic is as close to the speaker (headphone) as possible. All that’s happening here is playing the actual sound with reversed polarity. Sound in a room is really chaotic, and like I said, the actual waveform in a given point in a room is not really going to match the recording. Knowing the spectral signature of a song isn’t enough, the amplitude and phase of every frequency has to match exactly, or else cancellation doesn’t really work.

To give a real world example, when recording a choir in a room onto a given playback and you don’t have headphones for everybody, you can play the playback on a speaker, record the choir, and afterwards record the whole thing again without the choir, then reverse phase. Even though the acoustics should be exactly the same (nothing moves, same speaker, exact same sound conditions), cancellation is not perfect in this case.

It seems like just moving your head around would cause enough distortion due to the doppler effect that this wouldn’t work.

This is where the wheels fall off. “tinker with it” requires that you work out what the effect of the entire outside environment has had on the sound. That is supercomputer territory. If you think in spectral space, you need not only the frequency of each component, but also its phase. For a frequency of 1kHz, the wavelength is roughly a foot. If you are only accurate to 6 inches you will double the sound level, not halve it, so you need to be accurate to 3" as a minimum. At 2kHz your accuracy requirement doubles. And you are not even out of the range of ordinary telephone quality sound. Worse, you are in the diffuse sound field. You won’t know a-priori what the phase or delay (excess phase) is without considerable processing. Real time is as close to impossible as you might wish no matter how much processing power you have.

However the opposite argument might be reasonable. Can knowledge of the song being played help a conventional sound cancelling headphone work better? The answer here may be yes. Knowledge of the precise sound wave isn’t the problem. Rather you can help the sound cancelling by providing it with the general spectrum of sounds it should be trying to cancel. A set of tunable filters that precondition the sound being used as the cancelling signal could be fed by a system streaming the song and pre-computing the filter parameters. Here temporal accuracy would not need to be especially close. The processing demands of the system would be pretty big (unless you could pre-compute the song’s signature and stream that). That could be a curiously interesting niche market for Shazam. Recognise a song and then provide the needed dynamic filter setting to have the noise cancelling system adapt. But what it isn’t is using the doing itself as the cancelling signal - the signal still comes from the microphones in the headphones. The processed signal simply helps the existing noise cancelling system work better. There was a patent here. This is now prior art. :smiley:

But why download it, if you don’t want to hear it?

That’s how my ad-blocking software works – as soon as it identifies something as an ad based on its database of ads/spam, it deletes it from my browser, so fast that I never see it at all. Isn’t that what you want to happen with ‘unwanted’ songs? Or do I not understand what your goal is here?