There are lots of issues with digital reproduction that can effect the sound. For example, the quality of the filtering - as noted above, before you digitize the signal you want to filter out all frequencies above the Nyquist limit. But filters are not perfect. They don’t cut off frequencies perfectly, but roll off on a slope. They can introduce distortion.
Then there’s the quality of the digital to analog converters and other electronics in the analog stage. And if the DACs aren’t matched well, you can have problems with jitter.
Finally, digital distortion is very ‘harsh’. I had an el-cheapo first generation CD player that was horrible. Sibilants would crack and hiss.
And even though humans can’t hear above 20K, they can hear lower-order artifacts of higher frequencies. For example, a single square wave is made up of many, many sine waves of many frequencies. If you filter out the high frequencies, the square wave will not be truly square any more.
This is all in theory. The real question is how much of it is actually audible. I used to think “not much”, until I heard the difference between a regular CD and SACD and DVD-A. There is a noticeable improvement with the newer formats, even though all are digital. The difference may be primarily the difference between 16 bit sampling and 24 bit. Or maybe it’s the sampling rate. Or a little of both. But clearly, there’s more to it than just saying, “As long as you sample at 44.1 KHz you’ll perfectly reproduce anything people can hear” - professional audio gear uses 24 bit, 96 kHz sampling. For one thing, it makes it easier to make accurate filters.
Now I’m dredging up some old theory, so I could be a bit off base here, but I seem to recall that the problem with filtering is that ideally you want to pass everything up to the Nyquist limit, and cut off everything after it. Ideally, you’d like a perfect vertical slope. In reality, it’s very hard to make a good filter that has a steep cutoff slope. So there will always be some frequencies that sneak in which can cause distortion when the waveform is reconstructed. But if you sample at 96 khz, you can start your filter rolloff at 20khz, and as long as you have a perfect cuttoff by 43khz, you can reconstruct the waveform accurately.
Does that sound right? My electronics theory is almost 20 years old now.