I’m told that telephones transmit a pretty small frequency range of human speech. Whereas humans can hear from about 50 Hz to about 19,000 Hz, the bandwidth of a telephone circuit is about 50 to 7000 Hz. So how do we distinguisgh voices? incidentally, does VOIP transmit more frequencies than regular phone circuits?
Your ear synthesizes clipped frequencies. When an object resonates at the series of harmonic frequencies with a fundamental missing, it will also vibrate with the missing fundamental. I could explain more but I’m not sure how much of answer you want.
Land line phones really only transmit up to about 3.5kHz not 7kHz. VOIP is not one standard. There are many available codecs and one could transmit to what ever fidelity one desires. However in practice they transmit up to 3.5kHz because they need to connect with regular land line phones. In many cell phones there are codecs that can transmit higher frequencies. Enabling cellphone to cellphone calls to have better fidelity than land line to land line calls. I am told that carriers in Korea are rolling these sorts of features out as a way of differentiating their service from other carriers.
Remember also that even though we can hear sounds up to around 20,000 hz, the range of human speech does not go anywhere near that high.
For a male voice, most of the audio signal is in the 1kHz to 3 kHz range. Female voices are slightly higher, about 3 to 5 kHz (these numbers are from memory, so don’t beat me up too bad if they are a little off - they should be close). The point is that if you filter off anything outside of these frequency ranges, you really don’t lose much of the voice signal, so it’s very easy for your brain to recognize one voice from another.
The human brain is also the best pattern matching piece of equipment in the known universe. It’s ability to match patterns (which includes voices, pictures, faces, etc) is far better than even the best supercomputer can manage. Even with a lot of information missing, it can still match up one voice to another based on very subtle cues in the voice signal. It doesn’t need anything anywhere near a full reproduction of the signal in order to make a match.
The wireline telephone network transmits the analog waveform of your voice, band-limited to about 200-3200 Hz. Many modern communications devices, such as digital cell phones, do not transmit the analog waveform of your voice. Instead, they use a model of the human vocal tract and transmit the model’s parameters at a periodic rate. In general, the more parameters that are used, and the more often they are updated, the higher the voice quality. Going in the other direction, which is often necessary for low bit-rate channels, can result in poor voice quality. The result can sound like a robot or Donald Duck, and it may be difficult or impossible for the listener to identify the caller by the unique qualities of his or her voice. This problem is often aggravated if the caller is a woman, due to higher pitch.
With VOIP, assuming a reliable connection, voice quality is going to be determined by the quality and sophistication of the codec, and the bit rate. Simple codecs can deliver good quality if the bit rate is high enough. Complex codecs can deliver good quality at substantially lower bit rates than those used by simple codecs. This is very important for digital cellular phone systems, since lowering the bit rate for each call means that more calls can be carried in a very expensive radio channel. One problem with the complex codecs is that while they may work well with the human voice, they may work poorly, or not at all, with things like fax machines and computer modems. In those cases, simple codecs and high bit rates are more effective.
Do you have a cite for this becuase non of the cell phone codecs that I am familiar with do anything of the sort. There is no model of the vocal track.
See Linear predictive coding - Wikipedia for an example.
I disagree with you that LPC are really modling the human voice track in any meaning full way. But I see where you are coming from.