effects of network delays

I recently watched a TV show (The Voice-a good show) where the theme of the show is to have new singers try out for a singing contract. Due to the quarantine, everyone is staying home. Including each member of the orchestra. In what appears to me to be a feat of network engineering, everyone logged in from home (I believe they said they had 21 live feeds) and the orchestra accompanied the singers in real-time.

I was impressed. It seems to me that network jitter would have disrupted the orchestra. While a single feed from one orchestra would closely follow the singer, the idea that each separate instrument could stay in sync seems impressive. My friend tells me that millisecond jitter wouldn’t be noticable. While I agree in general, I would think that, say 2 violins, instruments could easily get out of sync and create noticeable interference. Is network jitter, as opposed to network lag, a noticeable effect when trying to combine different signals? Or do the differences average out over short time scales?

Are we sure the protocol used did not incorporate a small buffer to compensate for jitter? Maybe the mix sounded in sync because it was in fact synced up.

Most likely, it wasn’t actually a live feed. They recorded them all separately and then synced them all up together.

Did they have headphone/earbuds on? They were probably all listening to prerecorded music instead of each other.

Speed of light will impose an unavoidable delay, about 1 ms per 186 miles, so if the participants are 2000 miles apart, there will be at least 11 ms delay. That’s probably getting pretty close to noticeable in a music performance. But more significant is the codec delays. Any efficient codec, either audio or video, needs some history to refer to – an encoded video stream may say things like “this part of the image is the same as the previous frame, with these changes”. If you’re trying to keep the stream within X ms of real time, that can vastly inflate the size of the encoded stream, depending on how close you want to be. And keeping AV sync means you need to keep both the audio and video delays very close to each other. In this chart you can see typical latencies for a variety of audio codecs. AC3 is 40 ms, MP3 is at least 100 ms, etc. So in answer to the OP, yeah, that seems like they accomplished something close to impossible. Are you sure that it was live, and not, for example, a situation where the accompaniment was prerecorded and sent to the contestants ahead of time?

Note even “live” TV incorporates a broadcast delay sufficiently long to handle software issues.

It’s not outside the realm of possibility to have several separate streams and have them sync’ed by some kind of master mixing software at the studio. Each performer would have to be given some kind of tempo indicator, however, to keep them roughly in time with each other. Left to their own devices or to listen to a combined stream, that would lead to interesting effects.

What seems less likely is the concept that each of the 21 streams heard the others and could adjust their own performances in near real-time. It’s not impossible, but it’s more likely they played their own parts independently and rehearsed enough to be given performance notes to best blend with the others during the actual performance.

Note this is a problem similar to what you see in stadium concerts. There’s often a noticeable delay between what comes out of the speakers and what is performed. That’s why the performers often have a separate direct feed in their earphones. I once saw an unfortunate girl at a football game extend the national anthem out to 3 minutes (with at least 8 key changes) because she got tripped up hearing the delay of her own performance over the stadium speakers and tried to slow down to match.

Reminds me of the time I was on a field trip to a video recording studio, and they let us make a “news show”: The kid operating the teleprompter and I had different notions of where the reader should be on the screen, with the result that I sped up to try to match the teleprompter, and the teleprompter sped up to try to match me, and I ended up sounding like “Teddy Ruxpin hooked up to a car battery”.

It is entirely possible that they recorded the music in advance. The show said it was “live” but who knows what that really meant. Hearing that the CODECs impose a delay is interesting, I didn’t realize that. But it would be a fixed delay and I imagine that such a delay would be compensated by the musicians. After all, sound travels about 1.1 ft/msec and so in a large orchestra there is probably a measurable but constant delay from one side of the pit to the other. So the musicians would be used to such a phenomenon.