So I was watching a digital audio primer ([urlhttp://www.xiph.org/video/vid1.shtml) and it stated a few things that don’t make sense but are beyond the scope of the video.
First of all, can somebody explain in laymens terms why a given sampling rate can accurately capture any frequency up to half that sampling rate?
Second, how/why do sound frequencies beyond the sampling rate produce aliasing?
Finally, at one point in the video (12:04) they show a graph of frequencies with amplitude on the Y axis, and frequency on the X axis. Does this graph represent a single moment in time, or something else?
I looked on the wikipedia article for Nyquist’s theorem and I think it’s completely incomprehensible for someone who doesn’t understand the topic already. Yikes! So I did some more poking around, and I found this document. It explains the concept pretty well and has some nice drawings that I hope will make it clear to you. (warning - pdf)
The above document explains aliasing pretty well too, I think. The document uses the wagon wheel analogy. If you take video of a wagon wheel turning, if the frame rate of your video exactly matches the rpm of the wheel, then every time a video frame is captured the wheel is in the exact same place and when you play back the video it looks like the wagon wheel doesn’t move. At slightly below the frame rate, the wagon wheel appears to move backwards.
It’s the same thing when sampling a signal. If the frequency exactly matches the sampling frequency, then every time you sample it, the AC wave will be at exactly the same place and your samples won’t look like they are changing. If the signal you are sampling is just slightly under the sampling frequency, then you’ll sample it a little earlier in the AC wave each time and it will appear to be a low frequency slowly changing signal, which isn’t at all what it actually is.
If the signals are constant, then they may form a constant graph like that. If the signals are varying, then the amplitudes of the various frequencies will also change.
A typical stereo spectrum analyzer will show you how that graph would change with time for something typical like audio music:
On preview I see that the pdf document I linked to was ninja’d. Oh well.
Thanks for the great answers. That paper did explain everything pretty clearly. I feel kind of stupid for forgetting that each frequency cycle has both a high and low component. :smack:
And don’t feel bad about being ninja’ed, Engineer. Good things take time.
Here’s a little challenge for folks who say they understand the theory. I originally thought this little thought experiment challenged the theory but had to figure out for myself that it didn’t. Several pundits couldn’t explain why it’s fallacious; they just kept pointing to the dogma, showing that they may understand how to apply the theory, but they didn’t really understand the theory. Here goes.
If I a signal at the nyquist limit and amplitude modulate it by 1 Hz and sample that, I get the same results as sampling a signal that’s 1 Hz less than the nyquist limit. How can this be?
The first term is below Nyquist, and can be correctly sampled. The other term is above Nyquist, and aliases to (omega-1). So both terms have the same sampled frequency.
You cannot determine the frequency spectrum at a single moment in time, you have to integrate over some finite time window. In principle, when you talk about the frequency spectrum of a signal, you are talking about averaging over an infinite amount of time. In practice, you mean a time much longer than the reciprocal of the desired frequency resolution. Think of this as a time/frequency uncertainty principle.
It is, of course, useful to measure the time variation of the frequency of signals. For example “voice prints” display an estimate of the frequency spectrum as a function of time. To do this rigorously, you need to use “wavelet analysis.” Any such measurement is imperfect and you will have to think carefully about how you want to trade off time resolution and frequency resolution, since their product cannot be less than one.
I teach audio engineering at a media college, for a recording and production degree program.
The above posts pretty much nail it, from the theory perspective. The real world result is that as long as you’re sampling over 44.1, you can be sure that you’re creating a relatively accurate file vis a vis the range of audible human hearing. That said, most pros naturally operate at least at 48k.
I prefer to sample music that will end up at 16/44.1 at a SR of 88.2. The math when downsampling is simpler and less aliasing occurs as a result. Film and broadcast audio is more often 24/92 in tracking and downsamples to 24/48 the same way.
Of course, downsampling has its own hairy (and not necessarily agreed upon) issues, specifically with dither (false bits that prevent aliasing by providing artificial noise floor when downsampling) and with jitter (deviation from synchronicity based upon quantum fluctuations when using a master word clock)
I should add that higher sample rates are desired for two reasons:
oversampling can handle a higher frequency range of inaudible harmonic resonances which, while not perceived as frequency, still have a real and impactful relationship with lower order harmonics. In particular, the relative presence or absence of odd vs even harmonics of the audible spectrum is reported to have significant subliminal effects upon perceived clarity/dimensionality/phase coherence of the recorded audio.
Best practice dictates we capture audio at the highest practical bit depth and sample rate available to us given our processing power/write speeds/available rates to preserve the fidelity of extended frequency response and dynamic range. We maintain all audio at this rate through final mixing and all downsampling and bit depth conversion happens once, in the mastering process.
This is to both maintain fidelity and to minimize the effects of aliasing, noise floor and headroom distortion that can occur through multiple SR and BD conversions and lossy compression schemes.
Typically a professional production has several rendering stages through the production cycle, so a combination of effective gain staging and best practice re: SR/BD yields the most dependable results and outputs a track that degrades gracefully when consumers start compressing it into mp3’s using shit compression tools in a huge range of crappy resolutions.
Something that the video glosses over is the distinction between a sampled signal and a digital one. While it does not say anything technically wrong, it is a bit misleading and it would have been nice had it expanded it a bit more. It’s important to realize that once you have a digital (quantized) signal, you cannot reproduce the signal exactly in a theoretical sense.
The sampling theorem only covers time-sampled signals; the amplitude of the signal must still be allowed to match the original signal exactly. Now the chart he showed did match, but only because it was chosen that way. You can imagine that if the analog amplitude increased just a bit (draw the curve a bit larger) then the top peak of the original would not coincide with any of the digital lines. Then you could not recover the analog signal from the (sampled) digital one, at least not by using the sampling theorem.
In a practical sense, the noise in an analog process can easily be just as bad as most of the problems that quantization has, and sufficient bit depth for audio makes it pretty much not a problem to cover sounds that people are able or want to hear.
Bingo! The simple answer is that amplitude modulation introduces higher frequencies.
Evidently I didn’t explain myself clearly. The theory says I can sample and recover a 24KHz tone, sampling at 48KHz. Naively, I figured if I just raised and lowered the amplitude of that tone, I wouldn’t be adding any higher frequencies: just changing the amplitude of the one already there. When I do that, it looks just like a lower frequency tone would, in the sampled data. This seemed to contradict the theory, and several pundits on the internet who acted like they knew the theory couldn’t explain why it wasn’t a contradiction. They just kept pointing to the dogma. Nobody ever pointed out that amplitude modulation introduces higher frequency components, or did the math. Fortunately, I eventually figured out for myself that this must be what’s happening. I didn’t think I’d really found a flaw, I just wanted to understand what my mistake was.
It is actually a bit more subtle than this. The presence of noise is actually required as part of the quantisation process. It is usually called dither when quantising, but it is an exact counterpart to noise in an analog channel. Dither needs to be added to the signal during quantisation with an amplitude of half the least significant bit. It this is not done you get strange, bad, and audible distortion products. If it is done you get a distortion free quantised signal that behaves exactly like an analog signal with the same noise floor.
Recording with a 24 bit depth may mean that the noise floor of the signal is never going be be as low as the LSB, so you actually get away with no added dither, but for 16 bit recordings it is critical. The final mastering for CD should take a 24 bit master and during truncation dither the signal - for exactly the same reasons.
The use of shaped dither (ie dither that does not have a white spectral signature) allows for steering the noise into those parts of the audible spectrum where they are least offensive, and very interestingly, providing resolution below the LSB in frequencies that the ear is more sensitive to, easily getting 6db more SNR. Done right a 16/44.1 master can exceed the resolution of the ear in all respects, but is is marginal, and it isn’t hard to mess up. Sadly starting with a 16/44.1 recording is not going to get you there as the is exactly no wiggle room. Which again is why 24/96 is so ubiquitous.
The beautiful thing is that everything is covered by Shannon’s theorem. The Nyquist rate is a result of the theorem. The presence of noise in the channel, and the limitations of the information rate are identical in digital or analog domains. Moving from one domain to the other with a signal makes no difference, the information content remains the same.
Something related that’s coming up a bit more in signal processing circles these days is that the Nyquist frequency is fundamentally a pessimistic limit.
The Nyquist rate is the rate at which you must sample to be able to guarantee perfect reconstruction.
But of course, perfect reconstruction of some signals may be possible at under the Nyquist rate but may involve foreknowledge of the signals like sparsity in some representative transform domain (Fourier or wavelet or some other transform domain). So you can get perfect reconstruction at sub-Nyquist rates, but it’s not always guaranteed and it’s not usually as simple as the quite elegant sinc interpolation reconstruction from Fourier transforms.
This is the idea behind compressive sampling (and the related compressive sensing), which has already seen application in things like MRIs, which require huge amounts of space for storage using traditional sampling, and faster preview photos on cameras.