Digital audio drivers : why isn't this already done?

I’m a computer engineer.
I read some articles describing how to build simple headphone amplifiers, such as thisarticle. I also read articles describing various methods for the actual transducers that produce the sound (magnetic, electrostatic, and the most common method, “dynamic driver”, where a coil is moving back and forth, it’s basically a form of motor.

Anyways, you have an objectively correct metric. Please don’t respond to this post if you think objective measurement of sound is impossible. It’s very simple - you can take the Fourier transform of the digital audio input signal for every frequency. Correct reproduction, the resulting signal contains only the frequencies present in this digital source signal (well, the side frequencies aren’t in the audible range), the ratio of signal magnitudes is the same as the source (the overall magnitude can vary, obviously, as someone turns a control up and down), and that’s that.

Well, there’s all these limitations. Different transducers have different responses at different frequencies. Moving transducers need power to start but then when oscillating there’s only so much damping. The amplifiers introduce frequency specific phase shifts.

So the solution to this is also obvious. You need a good, reliable sensor that tells you what the transducer is actually doing. Some kind of element embedded in the transducer itself, such as a magnetic embedded in the transducer and a hall effect sensor array, etc.

You fourier transform this input signal, and compare the actual ratios of inputs to the setpoint. You compare the overall signal magnitude to the setpoint. Then, you have a learning algorithm that adjusts the input signal for every single control point across the frequency range - you essentially apply a transform to the input signal specific to each and every frequency. You fix phase lag by phase advancing or phase retarding by frequency. You can fix most limitations of your analog amplifiers and transducers this way, and thus use cheaper components and get the same audio quality. Transducers have state dependent responses - if it’s already oscillating at a specific frequency, it behaves differently than if it isn’t, so your algorithm has to be aware of that. Probably just a ~50 channel PID solution or something.

This is totally doable. A system on a chip with sufficient processing power that costs just a few dollars could do it. Why isn’t one of these inside each and every speaker and headphone sold today, with integrated sensors? Inputs would only be digital so all this correction would be done right inside the device.

The ultimate goal is to replicate the same sounds you would have heard if you were standing near the sound source, or to hear whatever the audio designer intended when he created the input signal using extremely expensive reference equipment that basically does what I am saying.

I’m probably totally mistaken what I think you’re describing, but if the sensor is measuring the frequency spectrum of the transducer oscillations, and is using this measurement to correct the transducer’s operation, how would that not result in just an infinite feedback loop?

Each timestep, the control loop creates feedback that adjusts the transform applied to the input signal. The feedback is not the output signal, think of it like a huge equalizer board with 50 or 100 knobs being adjusted 10000 times a second. As a control problem, this may be more difficult than it appears if there’s cross interactions - where if you adjust the gain at 1khz, it changes how the transducer responds to a 2khz,4khz,8khz signal at the same time. Nevertheless, your controller can factor that in, it just has to become more complex and perform more calculations. Ultimately, you have an input, you have an output, and you can measure the difference between the intended versus actual output. You can keep making the control algorithms ever more complex and iteratively improve upon them over ultimately a period of years, inching closer and closer to ideal. Use neural nets and other tricks if the math becomes too unwieldy, etc.

Still, the parts that are definite are :

  1. You need a very accurate sensor applied to the transducer directly (so there is no outside noise affecting the results)
  2. You need high performance digital chips that are also cheap to do the control algorithm, such as ARM system on a chips.
  3. The hardware is transducer specific. All speakers would have the digital controller, the DAC, the analog amplifiers directly attached and part of the transducer.
  4. The input signal needs to be lossless digital.

It was done.

It sounds a lot like you are talking about a servo controlled transducer. I have a velodyne subwoofer from the 90’s that uses it and it is great!

because it adds a lot of cost and complexity to solve the wrong problem. in most cases the biggest problem with the listening performance is the room. it’d be stepping over dollars to pick up pennies.

Neat.

I’d been wondering about the same thing as Habeed. Though I don’t see a learning system as being necessary, nor doing anything in the frequency domain. Instead, it seems that all you need is a tuned PID controller. The system is basically a mass on a spring. The feedback could either be a high-speed pressure transducer, or even (though this gets more complicated) a simple measure of cone displacement. If you can replicate the input signal exactly, then there’s no need to see if there’s any frequency dependence.

There’s more than enough control bandwidth available (the electronics can do megahertz) to handle up to audio frequencies, so that’s not an issue.

but that’s still solving the wrong problems. yes, speakers aren’t linear and do have phase shifts through their passbands. amplifiers operated below overdrive are pretty linear so the OP’s assertion that amps have phase shifts is nonsense. But neither of these are the real problem. if you’re listening to your audio system in a small room*, the room itself has far, far more significant role in sound quality than your equipment. this is the same kind of nonsense which makes people claim speaker cables have a significant effect on the sound of their system, when these idiots don’t realize that simply moving their head one centimeter to the side has an even greater effect than any piece of wire can.

  • if the room is in your house, it’s a small room.

in the case of the velodyne, I think its just a tiny accelerometer on the coil or cone linked to a comparitor circuit which subtracts the accelerometer signal from the input signal and feeds back the difference into the coil. The result is ideally that the cone is only moved by the input and not from room reflections and whatnot. Works surprisingly well. At least as well as your input source, of course. But I suppose this isn’t a digital driver as the OP was talking about.

Past a certain point, sure. That doesn’t mean you don’t want your equipment working as well as possible, though.

This kind of thing might actually make the biggest difference at the low end. Imagine you have some really shitty driver with all kinds of nonlinearities, and some kind of totally untuned enclosure. $5 in equipment could turn that into something that’s basically perfect. Contrast to spending hundreds on improvements to materials, etc. to improve the passive input response.

No, that’s totally different. Claims that fancy speaker cable works better than lamp cord is just outright pseudoscience. Physically, it behaves exactly the same way. A closed-loop control system actually has a measurable difference in behavior.

Elegant–I like it. Not perfect, since an accelerometer isn’t going to be as good as a linear displacement measurement, but pretty good. Basically a simple PD controller instead of PID.

I like it as well. My thought about frequency dependence would let your control loop get ahead of the errors - there would need to be a digital buffer of future audio signals. Wouldn’t work for realtime sources like a video game, but music and movies and tv could be buffered, although the TV/music/video player would need to cooperate.

Or, even if you don’t buffer, you know the current state of the driver if you have been measuring it passing a magnet it or something very precisely, and you know the current voltages and currents to the coils driving it. You know the current input signal. You can look up in memory what happened last time this state was present and correct for errors before they are detectable.

I was actually thinking of the specific case of headphones, where you remove the room from the equation. Also, regarding phase lag - I am reading it right from the Texas Instruments worked example, where there’s a small amount of lag in their gold standard example circuit.

How about a bluetooth microphone that the listener wears around their neck and the OP’s algorithm is used on that signal to adjust the room’s speakers? Of course, you’re screwed if the mic goes even slightly put of whack.

That is just not true. I hope you will agree that no reproduction system will ever be 100% accurate, so the goal is to be as close as possible. However, in human perception of sound (similarly with vision, or smell, or taste…) not all aspects of a sound, especially in music, are equally important. A ‘better’ reproduction will involve compromising some less important aspects in order improve more important ones. To complicate things more the ‘optimal’ compromises will not be the same for different listeners or pieces of music. Measuring objectively how faithfully a given waveform matches another may be possible. Measuring objectively how that translates to user experience is not. Two waveforms may be measured as being equally close to the original, but a listener may not agree. You are essentially trying to objectively measure a subjective experience.

I think you’re way overestimating what it costs to make a decent speaker. that $5 would be better spent improving the speaker itself.

plus you run into the problem that it’s only really practical for low frequency drivers like woofers and subwoofers; once you get into midrange and high-frequency drivers the transducer you use will have as much or more mass than the speaker’s diaphragm, and you now essentially have a completely different speaker.

further, Habeed’s proposal only works for excursion-related non-linearities; where the distortion of the speaker is due to either poor motor design or pushing the speaker past the mechanical/magnetic excursion limits. it cannot do anything about cone breakup modes, unless you were to litter the cone surface with accelerometers and associated wiring. Which would again make me say you’d be better served spending the money on improving the loudspeaker driver itself. The OP’s suggestion just sounds like tech wankery for its own sake.

I’m talking about the mindset that chases some notion of “perfection” in the equipment when everything around it is hilariously imperfect.