Why does telephone audio sound so bad?

I understand that conventional telephony transmits only a limited acoustic frequency range, about 300-3000 Hz. OTOH, Skype and some other digital services transmit a much wider range of frequencies.

However, even accounting for this, the audio quality of conventional telephony sounds rather crappy. It’s similar to the sound a singer produces when he cups his hands around the microphone and his mouth. Why is this? Surely there’s more to it than just limited frequency range. If I dial down the bass and treble on my stereo, the result is still qualitatively different from what I get if I pipe that sound through a telephone.

Miniaturization of speaker. To get broad-range response, a speakerh as to be physically large, in order to have the slow-vibrating surfaces required for low tones. Hence the size of woofers and boom boxes.

Telephones also transmit at a lower bit rate and bit resolution than voice communications programs.

For example, CD audio is transmitted with a sampling rate of 44,100 times per second, with each of those samples being 16 bits. Telephones sample 8000 times per seconds at a bit depth of 8 bits per sample. Now 16 is not double 8 - every extra bit doubles the amount of states/informaton you can store. In comparison, if you think about a digital image, a 16-bit per pixel image can have a total of 65,536 colors, and a 24 bit image can have 16777216 colors.

In this comparison, the amount of colors is essentially the amount of different sound states that each of those little samples can have. If it doesn’t have the ability to store the complexity of the sound accurately, it simplifies it down to a fewer amount of bits, which will change the sound.

The number of times it samples per second (8000 vs 44k for CDs) also creates a limitation - it has less temporal information about the sound, and so bigger gaps have to be filled in, which can alter the quality of the sound.

You can experiment with this by using different voice recorder programs using different codecs at different bitrates. The lower ones will tend to sound more phone-like in terms of quality, not as natural.

Which codec is being used is also a factor - codecs are purpose-made, and the phone ones are probably limited in a way that transmits the most useful vocal frequencies above all else, and that will color the tone in the same way that using different equalizer settings on a stereo can make a sound unnatural, by dropping out ranges of frequencies you expect to be there.

Both of the above. Limited data played on an even more limited speaker.

Presumably there was a time when conventional telephone was entirely analog. To my ear, the sound quality hasn’t particularly changed since I was a little kid in the 1970s). When did they go digital?

But again, it’s not just about frequency response. And in fact, small speakers and microphones can sound pretty good. If I use my cell phone to call my wife on a landline, the sound quality is crummy. But if I use my cell phone to call my wife on her cell phone (we’re both on AT&T), the sound quality is comparable to Skype on desktop PCs - IOW, the small speakers and microphones of our cell phones are definitely not an impediment to overall sound quality, or at least not the impediment that’s responsible for the aspect of sound quality I’m trying to describe.

AT&T and other carriers now implement VoLTE (Voice over LTE), which sends the voice data digitally over the much larger LTE pipe. A larger data pipe means a higher sampling rate and frequency range. That’s probably why things sounds better from cell to cell.

This is close, but contains a classic misunderstanding of the nature of sampled sound. The sample rate is identical to having a bandwidth of 4kHz. There is no “filling in” of the gaps between the samples in any sense other than that there is a 4kHz bandwidth filter applied to the signal. On both the transmitting and receiving ends this filter must be present. On the transmitting end it is usually called the anti-alias filter, and on the receiving end it is usually called the reconstruction filter. But no matter, the signal chain is symmetric from end to end. It has exactly the same properties as a bandwidth limited analog transmission chain.

The point about codecs is important. The data undergoes compression once digitised, and is subject to compression that drives the sound through a channel of much less data bandwidth than 8kHz/8bits. The use of perceptual codecs allows much more to be squeezed into the data channels. Crafting the codecs to maintain intelligibility in the face of significant data compression is quite a trick. Removing dynamic range in higher frequencies can drop bandwidth needs dramatically.

There is now a high resolution audio standard for calls, which most modern mobile phones support. Sadly it seems that the lack of cooperation between carriers means that we are going to be waiting for rather a while before calls other than those that stay within the one carrier are able to make use of it. The difference in clarity is astounding when you get it. But calls via Skype, iChat, and the like get you that level of clarity right now, and without the call costs.

If landline and cell phones actually conveyed 3000 Hz audio bandwidth with low noise and good latency, they would sound great. Unfortunately in real world usage, they often do not.

Landline phones are often terminated at each end with a cordless handset of varying age and quality. Those can often degrade the signal – in addition to quality issues from the carrier. So the first step in troubleshooting poor phone audio quality is using a wired handset – on both ends of the connection.

Even if you have a wired handset, much terrestrial phone service in the U.S. is over a PSTN circuit-switched digital network (not VOIP) to your subdivision or premises, where it becomes analog for the final run. Comcast phone service is technically VOIP but it runs over a separate dedicated network, not the internet, so from a user perspective it has the look and feel of analog telephone service.

So the landline signal starts as analog, is converted to digital, passes through a public switched digital network or private digital network, then back to analog, then travels over aging copper wires possibly thousands of feet to your house. After that it enters your house and travels over more aging copper wires which were possibly installed by a hurried technician many decades ago, finally reaching your handset. There are many links in the chain where something could degrade the signal, usually on the analog segments.

With cellular it is even worse. Cell phone audio quality is often worse than the very first telephone call by Alexander Graham Bell: “Mr. Watson–come here–I want to see you.” That was in 1876.

Today the smartphone in your hand may have 20 times the computational capacity of a Cray-1 supercomputer, and it uses Software Defined Radio technology to encode and decode the audio: Software-defined radio - Wikipedia

Considering this, it is a sad state of affairs that we have actually gone backwards in audio quality from the late 19th century.

There is hope this might improve in the future: Why Mobile Voice Quality Still Stinks—and How to Fix It - IEEE Spectrum

I miss the days of talking to someone on the phone, stretching the handset cable into the bathroom for privacy, and being able to hear their every breath, every nuance of their voice.

Back in the analog days of radio, reporters would record an interview on tape, then remove the mouthpiece from the phone handset and use a cable with alligator clips to physically connect the tape recorder to the phone. In the studio the phone was hardwired into the console, bypassing the earpiece.

The resulting quality was surprisingly good, and both ends had much more control over the volume. So, simply improving the in and out made a substantial difference.

However, there was nothing you could do about the 300-3000Hz frequency range, so things like inflection and sibilants were always muffled.

It’s important to differentiate between why it sounds bad today and why it sounded bad in the past. Today it’s all digital but at limited bandwidths. And actually 8 kHz audio can sound pretty good except that it lacks a high end. The trouble is that the conversions between analog and 8 kHz or between higher bitrate digital and 8 kHz are often quite crappy so you often get additional filtering or not enough filtering so you get aliasing sounds.

I believe that for old phones the limitations were in the lines and the coils that were used to condition the long lines (believe it or not, you would get an actual wire that crossed half a continent for long distance phonecalls) as well as the carbon microphones that were used in phones until around the 1980s because those don’t need any electronic amplification.

Interesting sidebar.

Yes - Going For the One album

Wakeman plays the pipe organ at St. Martin’s church in Vevey, which was simultaneously recorded through high fidelity telephone lines while the rest of the band played in the studio in Montreux. Wakeman described the experience as “absolute magic.”

I get phone calls all day long from all over the United States and the quality of the sound varies wildly from call to call. Most cell phones sound terrible. Every now and then I get a call that is beautifully clear and sounds good, but crappy to non-understandable is the norm.

Funny coincidence: just now had a 1hour phone call over Skype audio only, noticed the same thing.

Actually, real telephone audio is quite good - Ma Bell optimized their whole system to carry that 300Hz-3kHz band with admirable clarity and minimal distortion. Even the first analog cell phones did a pretty good job. It was the digitization of voice channels (PCM for digital cell systems) that started the breakdown that’s led to lesser and lesser quality.

At the time digital cell was taking over, I worked in the telephony industry, and my boss was obsessed with testing frequency response and distortion on the various systems. The digital phones, even the good ones, were noticeably worse than the others, especially up against ‘real copper.’

Most people probably make most calls on digital systems these days and have no idea that phone calls didn’t used to sound like, well, phone calls unless you tried to play music over them.

You don’t remotely need 8 Khz audio for good quality. 3 Khz can sound great, as any ham radio operator can demonstrate using a high quality analog SSB signal. It can sound like the other party is in the room with you.

Likewise digital public service fire/police radios using APCO-25 and ham radios using digital voice on D-STAR have an audio bandwidth of about 3 Khz, and except for a trace of robotic processing, sound quite good – better than many cell phone signals.

APCO-25: Project 25 - Wikipedia

The problem is not audio bandwidth or the digital conversion. It is the cell carriers extreme using extreme compression and various other shortcuts to shoehorn more voice channels into a given number of RF slots and cell towers.

This is widely known and has been discussed many places:

To clarify for everyone. Discussion of 8kHz isn’t referring to the bandwidth, it is referring to the sample rate. Worldwide the digital phone systems run with an 8kHz clock. The packet rate in ATM is 8kHz. This was chosen in the days when voice was king, and essentially the entire digital phone network runs at the sample rate needed to get you a 4kHz (theoretical maximum) bandwidth.

The advent of VOIP has brought about a move away from 8kHz is king thinking, and now it is possible to allow much wider bandwidths. The down side is that the inter-connects between carriers needs to support voice switching over IP in order to be able to carry the VOIP based calls. Mostly all the voice infrastructure is still ATM based, and so, although you can get a high bandwidth call between two endpoints that are with the same carrier, we are waiting for the infrastructure to catch up to allow this to become ubiquitous, and so most call remain awful. Part of the problem is that there is no incentive for the carriers to fix the infrastructure.

Missed the edit window:

In principle they could probably move to AAL 3/4 or 5 and run IP over ATM, but the system probably doesn’t even support that. Much of it is probably quite old, and you need both ends to agree on how the interconnection is done.The point about ATM is that the quality of service is iron clad. IP is much much looser. Old school telco engineers have trouble with this. In order to upgrade the voice networks you are looking at ver serious money. And all it will do is get you a better quality call. Yet Skype and the rest are already cutting the telco’s lunches (old Oz expression) providing the high quality calls for free, and there is simple no money in it.

Back in the analog days of radio, radio stations would use “tie lines” between studios, and, on some ocassions, to outside broadcasts. “Tie lines” were ordinary telephone lines that wired up at the telphone exchange (or junction), instead of going through the switches. In Aus and the UK the service was widely used to connect factory locations, or university locations, or office locations, or military locations (dunno about USA, where account billing was different)… Because tie-lines didn’t go through the bandwidth filters, you could easily get 6-8K bandwidth, instead of 3K. Because they didn’t go through the switches, there was much less random switching noise.

For even better sound quality, they could use two pairs, modulate and frequency shift the high freq down to telephone frequencies, and get 6 (switched) or 12-16 (tied).

The above post by iljitsch clearly stated “8Khz audio”.

However you raise a good point since there are several different bandwidths involved:

(1) Audio bandwidth of the transmitted or received signal. Typically about 3Khz (or from 300Hz to about 3,000Hz or 3,500Hz will produce a good, clear signal

(2) Sampling rate at which the audio signal is digitized. In general a 2x sampling rate can recover all the original content, per Nyquist. If the desired audio bandwidth is about 3 Khz, it would typically be sampled at 6 or 8 Khz.

(3) Vocoder output bit rate. The above sample rate is encoded by a digital speech encoder (vocoder) which compresses the speech to a lower output bit rate. In the case of the public service APCO-25 Phase 2 system, I think the encoded voice rate is about 2,400 bits per second. Vocoders can be implemented in hardware or software.

(4) RF bandwidth occupied in the transmission medium. The above vocoder output bit rate is modulated and encoded by the RF transmission system, whether that is over the air, cable, etc. The amount of occupied bandwidth on the transmission media per input bit rate is called “spectral efficiency”, usually expressed as bits per second per Hz: Spectral efficiency - Wikipedia

The bottom line we today often experience landline or cell phone audio quality which is even worse than hyper-compressed APCO-25 phase 2 public service radios.

Here is a good example of an APCO-25 radio using only (I think) 2,400 bits per second vocoder output. It has a robotic quality but is still better than the audio on many cell phone conversations: Uniden BCD436HP Performance - P25 Phase-II (TDMA) on 700 MHz - YouTube

It is a sad state of affairs when the voice telecom infrastructure can’t even match this (poor) example. The audio in many digital telephone conversations is probably worse than the first phone system set up by Alexander Braham Bell. At least Bell’s assistant Mr. Watson could understand what he said, which is sometimes not the case today.