I’ve always wondered something about the way audio is digitized into a pulse-code-modulation signal such as on a CD. If the audio is sampled at say 44 khz, I understand why 22 khz would be the highest frequency it can reproduce. You’ve got 22,000 ups and 22,000 downs so it’s up,down,up,down all the way. I understand how it would reproduce 100 hz, and 101 hz should work well enough as well.
What about 21,199 khz? It’s up-down-up-down, as 22 khz tone so many times, and then what, a pause? I don’t understand how it would work and make any sort of 21,199 hz sound. And any frequency close to the limit, it seems like you would just have to have a 22khz tone with a glitch a certain number of times a second. I would think you would have to get some distance from the limit before it would work well.
How does it work?
PS: I know about the whole Nyquist frequency thing, and it seems pretty obvious why you can’t go above it. I haven’t seen anything explaining how it works below the frequency though; everything seems to take for granted that everything below it is perfect, when it seems obvious to me that you can paint a mental picture of trying to make a 9 hz wave with 20 samples and it doesn’t fit right.
I see the problem you’re having with the picture. D/A converterstake samples of the original waveform every second and reproduces, as best it can, a new waveform from those samples.
Think of a compressed slinky against a pinstripe background. There are 44,100 stripes every 1 foot of background. The compressed slinky also has a lenght of 1 foot. Let’s pretend there are 44,100 “rings” in the slinky; that would be the maximum number frequency possible. One “ring” for each pinstripe. Now let’s extend the slinky by a few “rings” so that is a slightly bit larger than 1 foot, even though the background pinstripes cover 1 foot. Your wavelength has increased, and the pinstripes intersect on different parts on your slinky. The 1 foot of background represents 1 second of time. Marking every intersect point would allow you to retrace an OK representation of the slinky’s position.
Digital Audio - better when pushed down stairs.
ETA: I also wanted to say that certain D/A and A/D converters are better than others at reproducing the waveform.
Pulse-Code-Modulation is not just up-down-up or on-off (1-bit) - you have as many bits of resolution as your AD supports (4,8,16,18,24,32 are common), giving (for 8bit resolution) -127 to +127. Your samples don’t just hit the peaks - your 21,999 hz sine wave will start sampling near the peaks and move away as time goes by, and then move back. But frequencies near to the Nyquist frequency are not good candidates for sampling.
e.g looking at a few samples, made up values
1 127 peak of sine wave
2 -126 just before sine wave minimum
3 +125 further before sine wave maximum
4 -124 and so on
of course, real sound rarely is a pure sine wave, so it is more complex. And when you do hit the Nyquist limit, you have turned whatever waveform you started with into something that is pretty close to a square wave (i.e. severely distorted or aliased). The 22k limit for CDs was chosen because it is way out of the human limit of hearing, and the higher frequency distortion introduced by sampling is barely (if at all) audible. The actual filtered cutoff is about 20kHz so that frequencies close to the Nyquist frequency are not sampled. Also, random digital noise is deliberately introduced (dither) to cover regular sampling artifacts. But most professional audio digital recording is made at 48kHz or (more commonly now) 96kHz to collect more high frequency from the signal and downsampled after postprocessing and mixing for CD.