I was wondering: Why does most digital audio use a 44.1 kHz sampling rate? Is it because 44.1 kHz is roughly twice the bandwidth of human hearing, which is sufficient to reconstruct the signal per the Nyquist-Shannon sampling theorem?
Also, if I understand the sampling theorem correctly, what good are higher sampling rates like 96 kHz for DVD, and the 192 kHz for these new gen formats? Is it just overkill, or do they impart some benefit?
Higher sampling rates make lower noise and distortion from the low-pass filters. In the real world, a filter can’t simply cut off the signal above a certain frequency and leave the entire signal alone below it. The filter will instead cut off on a ramp, with the steepness of the ramp determined by the type of filter. So by increasing the sampling rate to something way, way above audible frequencies, you can use simpler filters and avoid the distortion.
That said, I think the real-world sonic benefits of greater than 44.1K sampling is very difficult if not impossible to hear My understanding is that 24 bit/96khz has been used for a long time in studios, to avoid contaminating the track as it goes through multiple mixes. Keep the noise floor and filter distortion far away from where the finished track will be when it’s downmixed to CD, and you make them inaudible. But that doesn’t mean that 96khz in home equipment will provide any real sonic benefit.
The big advantages of SACD and DVD-A, the two most common high-resolution formats, is that they are multi-channel. When people hear SACD and DVD-A the first time, they’re often just blown away by the startling jump in the quality of the sound. The main reason for this isn’t the increased bit depth or sampling frequency, but the multi-channel mix and hte fact that most of these discs are produced with extreme care from high-quality masters, whereas the CD version of the songs may date from an old master that just isn’t very good.
The other advantage of those formats is that they are uncompressed. Uncompressed in two ways - uncompressed in dynamic range because the dynamic range of 24 bit is 144db, whereas 16 bit is 96db. In the real world, you can get sounds in that range, so the higher bit depth allows you to reconstruct the original dynamic range exactly. But CD’s often have their dynamic range artificially compressed to make them more radio friendly - the average loudness goes up, which gives the music some ‘punch’ when played against other random music, and casual listeners are often in environments with a lot of background noise (offices, in the car, outside, etc). Music with full dynamic range would have its quietest details buried in the noise floor. So they compress it. But for an audiophile with a quiet listening room, an uncompressed track is MUCH preferred.
There’s a lot of debate about this in audio and recording circles. Many credible people think that even though 44.1 is “legally” twice the range of human hearing, there are unheard but psychoacoustically “felt” frequencies that have an affect on the listener that are lost at 44.1.
Well I don’t know much about perception, but I do know that speech data in certain types of segments, most specifically clicks and ejectives, occurs in the 40k-50k range.
I’d ask a hardware guru. It might be that figure is a mutiple or related to the capabilities of certain standardized chips that are commonly used in consumer electronics. Certain ones are used very widely as workhorses.
There are all kinds of signals at 20 kHz and 50 and 100 and beyond. The existence of signals in the physical world isn’t directly relevant; only their perception is. Individual humans have some differing frequency above which perception of pure tones rolls off rapidly. 20 kHz is commonly cited. It’s a subtle physiological question, I think, whether frequencies above your own rolloff frequency matter to you. They certainly aren’t very important, and there are many other much bigger bars to fidelity.
Sampling has another oddity. A frequency somewhat higher than the sampling frequency is aliased as a frequency that same distance lower than the sampling frequency. So, there are various schemes to help make sure this does not create artificial signals. This, and the limited sharpness of rolloffs of real analog and digital filters, are some of the incentives to use higher sampling frequencies. But we are certainly talking about very refined listening. Maybe this is 1000 times less important than whether you do your listening in a car, or whether you can hear the wind outdoors when you have your headphones on in the den.
For music this is correct, but for speech perception the production is vitally important. Some theorists hold that there are aspects of perception rooted in production, and the speech system in general doesn’t like uneccesary stuff; it’s possible to produce clicks and the like without the high-range effects, which would suggest that their presence has some kind of utility.
Thanks Sam Stone, Napier, and SmackFu for your informative posts. Is is fair to say then that higher sampling rates have utility insofar as they may help reduce signal distortions caused by the limitations of hardware?
About what VCO3 and Omi no Kami have brought up: Have there been any studies to examine whether these “inaudible” frequencies contribute to sound perception? Seems like a pretty simple experiment to set up.