Certification has nothing to do with need. For something that’s essential for safety (like communication equipment), it must be proven to be safe, reliable, have sufficient redundancy, etc. For something non-essential (like a science experiment), it just needs to be safe (i.e. not endanger people or the station under any circumstances).
There are several factors.
The Apollo lunar missions were coming from up to 1/4 million miles away, and it required a 210-ft diameter steerable dish for full bandwidth (see cars for scale): https://honeysucklecreek.net/other_stations/goldstone/Bill_Wood_Goldstone/GDS_booklet/dss14.jpg
For most earth/space transmissions they used “unified S-band” at around 2.1 Ghz, and the voice was embedded with telemetry. The S-band signal was received at the big dishes, then the voice filtered from that, then injected on the NASA terrestrial network. With analog voice radio, each link in the chain typically degrades quality a bit.
Since you don’t need more than about 3Khz to clearly convey the human voice, voice transmissions during the Apollo era were filtered to about 3 Khz audio bandwidth. This enabled better clarity for given amount of RF power.
Despite this the overall audio quality wasn’t that bad. Listen to the Apollo 15 astronaut’s voices from 1/4 million miles away when landing on the moon in 1971. They are apparently using VOX (Voice Operated Mic) so some of the syllables are clipped. This is normal. https://youtu.be/XvKg68DcTZA
In the shuttle/ISS era the audio during live transmissions didn’t sound much better. During the latter shuttle era the TDRSS satellites were available so in theory this could have improved quality but to me it sounded about the same.
There is nothing inherent about audio recording from space that makes it sound bad. Listen to this tour of ISS starting at about 00:25, where the audio was apparently captured on the on-camera mic of the camcorder. Despite the background noise, it sounds quite good when the camera is close to the astronaut: https://youtu.be/QvTmdIhYnes?t=25
This indicates when live audio is transmitted, the lower quality is due to the modulation and encoding system. There is generally limited benefit in intelligibility to exceeding much over 3 Khz audio bandwidth. A 6-8 Khz audio bandwidth might aesthetically sound better but for a given amount of RF power and system noise, its not necessarily any better at conveying dialog.
With any “safety of life” application involving a deployed system there is always the changeover problem. E.g, aviation still uses AM radio for voice because changing that to SSB, FM or digital would require maintaining the existing system while installing new radios and antennas in every aircraft and ground station on earth, then somehow switching over or running them in parallel. Additional RF spectrum would also be needed.
We have lots of technology today and the ability to send digitally-encoded voice, but that doesn’t always result in better quality. Modern cell phones use digital voice codecs and have the 200 times the computation power of a Cray-1 supercomputer but audio quality is often worse than an old analog cell phone or even a short wave radio.
Random browsing turned up the following document, which is instructive in the way these things work:
https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19720022495.pdf (Multi-EVA communications system analysis, Final Report, June 30, 1972).
This 225-page report, prepared by RCA for NASA, lays out and evaluates in comprehensive detail the functional and technical requirements of NASA’s proposed system. The conclusion is that it would basically work, but would be not that great and too expensive to implement.
The appendix with all the specifications is interesting: everything from voice dynamic range, frequency response, and noise requirements, to temperature/vacuum/acceleration/humidity/vibration/etc tolerances and testing procedures.
There is much more to it than simply downloading an MP3 codec from the Internet.