So digital cell phones use some encoding techniques that are highly optimized for speech, but make everything else sound horrible. If you’ve ever had to listen to hold music over a cellphone, you know what I’m talking about.
Does this optimizing cause problems for some languages? Or do they use a different encoding system in different regions?
A little of a WAG, but I’d say that most languages will have the pauses and built in redundancy that the codecs are optimised for. Music, having constant background sounds in most cases, has different characteristics that aren’t handled well by the codecs.
I know some languages use pitch and tone to modify meaning, and if you’re not carrying the right pitch or inflection, it could make for some very odd conversations.
Modern voice codecs do consider and test against many different languages before they’re put into use. This paper has background info on how speech codecs work.
Figure 5 on this page compares different speech codecs against different human languages and rates their quality with a PESQ score, where 5 is excellent and 1 is bad (page 8 of this more detailed PDF about PESQ). The PESQ score is basically a machine-measurable statistical model that closely correlates with listeners’ subjective ratings of speech fidelity, called their “mean opinion score”.
In that chart, many of the codecs seem to have more difficulty with French than other languages, while Spanish seems to give them the least trouble. However, overall, listeners found all the languages across all the codecs to to have “Fair/Good” to “Good/Excellent” quality. I would assume codecs that don’t rate at least that high simply never make it into the market.
As an aside, I’d guess that those MOS and PESQ scores are based off language samples that are reasonably long in duration (i.e. a phrase or sentence). It would be really interesting to measure the codecs on a phoneme-by-phoneme basis because so much can be said (or missed) in one syllable.
ETA: I’m not an audio engineer. Just Googled that, so if anything is wrong, feel free to correct it.
Simple answer - legacy GSM used vocoders that are “tuned” towards English and German. Legacy CDMA tends to perform better across all languages.
Voice over LTE isn’t really standardized enough to comment.
The specific example I was given was that GSM doesn’t perform very well for many Asian languages.
That was the assumption, but he wasn’t sure. I actually had to dig through a few people before I found anyone with decent information, and that was mostly from reading the history of the technology. It apparently isn’t as obvious in practice, even though it’s there.
Thanks for the info all. I guess it’s really not my cellphone that makes indian-accented english so hard to understand. Those two languages just can NOT play nice together.