Speed of sound varies about 10:1 depending on the gas. Hydrogen at 1300 m/s, Helium 973 m/s, Air 331 m/s down to the favourite SulfurHexaflouride at 133 m/s.
But as we will see, density important.
Hydrogen 0.09g/l Helium 0.166g/l Air 1.3g/l SulfurHexaflouride 6.16g/l Perfluorobutane 11.2g/l
Speed of sound might have a slight direct effect, but isn’t the dominant effect on the vocal tract. We mostly don’t contain sound waves bouncing off things. We contain Helmholtz resonators, and these are defined by the compressibility of the gas and the mass of gas present, so the volumetric density of gas.
The speed of sound is in principle defined by the compressibility and volumetric density V = \sqrt{K/\rho} so it appears in discussions - but it is confusing as it isn’t representing the propagation speed of sound. It appears because it depends on the same underlying parameters that define the Helmholtz resonances.
A well known Helmholtz resonator is to blow over the neck of a bottle. It makes a tone. The frequency is, perhaps counter intuitively, not defined by the height of the bottle. It is a function of the volume of the bottle, the height of the neck and the diameter of the neck. And the gas. The gas in the main volume of the bottle is simply providing a spring (so compressibility), and the mass of air in the neck is providing a mass that resonates with the spring (so density). Take a second bottle with a neck half the height, but the same volume and the resonant frequency is much higher. (A simple open hole operates as a neck due to end effects. A round hole appears as a neck 1.7 times the hole diameter. The same end correction needs to be added to any actual neck as well.) Flute and Ocarina are other Helmholtz resonator examples.
Similarly, our vocal tract is highly dependant upon the diameter and length of components and thus the mass of air in those components, and the volume of gas in those components. These define the frequencies they tend to select. Whistling - the frequency depends the size of the lips opening and the volume of air in the mouth behind.
The volume of air in the pharynx acts as a spring against the mass of air in the passages leading to it, but will act as a mass. So it gets complicated. Similarly, the volume of the mouth and the effective cross sectional area.
As we speak or sing we modify the cross sectional area of these passages, and this modification changes the mass of air in them, which changes the resonances present. The very harmonic rich output from the vocal cords is passed through what is effectively a dynamically modified band pass filter, which provides the final sound.
The nasal volume is also there acting as a spring, but we can’t modify its volume on demand. Get a bad cold, and the nasal passages close up, leading to a change in diameter of passages, which leads to increased frequency of resonance and the characteristic nasal sound. Holding one’s nose has the same effect - the closing of the nose removes an effective moving mass (the air in the nose proper) from the resonant system, thus increasing the resonant frequency.
TL;DR:
Gas properties affect the voice. But it isn’t the speed of sound directly that is the determinant. It is the volumetric density of the gas and the compressibility of the gas.
The speed of sound may appear in Helmholtz frequency calculations, but that is an artefact, as it just turns out to be a convenient unit to combine density and compressibility.
Breathing Sulfur-Hexaflouride and Perfluorobutane will do exactly what one might hope for.