How do they isolate voices during audio surveillance?

I’ve been reading about Khashoggi’s murder and some of the audio that’s been reported.

I assume Turkey has a truck outside the Saudi embassy building that listens for conversations? Probably a parabolic or a laser microphone.

How would they isolate dozens of people talking throughout the building?

We don’t know how Turkey made this recording.

However, the theory that there is a van outside with a very sensitive microphone recording dozens or more conversations through myriad walls and rooms, however, is not possible.

I understand that someone within the Saudi embassy made the mistake of connecting to the wifi connection labeled: Turkish Surveillance Van #301, which permitted unfettered access to all of the systems within the embassy .

How else would they get this audio?

Intelligence agencies regularly sweep embassy buildings for electronic bugs.

I’m not sure how advanced Laser microphones have become. They typically rely on rooms with windows.

I thought I’d read that Khashoggi was using his phone to stream audio to his wife, but I can’t seem to find that in recent articles. Perhaps it was just a cover story for the fact that the Turks have listening devices in the consulate. Or, perhaps, the recording was made by a Saudi working for the Turks.

I seem to remember something about Khashoggi activating some sort of audio recording through his Apple watch, which was connected to his iPhone which his fiancee was holding for him outside of the embassy.

So his fiancee heard his murder as it occurred?

Gosh. That’s horrible.

If you have recordings from multiple sources (i.e., different directions from the signal of interest) you can use signal processing techniques such as cross-correlation to accentuate sounds that have a specific phase relationship (due to different distances from the various microphones) from those that have different phase relationships. A primitive form of this processing is key to the plot of the movie The Conversation.

If what I heard was accurate. I hope that the audio was just being recorded and she wasn’t listening in real time.

I’m no spy, but I figure other intelligence agencies regularly try to get electronic bugs into places that can’t be detected by regular sweeps.

Then they say “we were outside in a van” so no one knows they have these really cool nondetectable bugs.

Even if that was used, it isn’t going to pick up the conversations in the cafeteria if you’re aiming the laser at the ambassador’s office.

Interestingly, the article linked in the OP has not only several nonnative speakers refer to the audio segments as “footage” but the article does as well.

Ive always seen these referred to as “clips.”

Both words go back to analog days, but IFAIK, “footage” dates back to film. “Clips” was used back when I was involved in pro audio business.

The latter is now used for both audio and video but I haven’t seen the former used for only audio.

Exactly what a spy would say.

It’s been reported that the murder was broadcast on Skype back to certain people in Saudi. That could have eavesdropped on.


Khashoggi took precautions activating his Apple watch. I guess he was concerned about this embassy appointment.

He really needed a couple witnesses to go with him. Prominent Westerners that couldn’t just disappear without awkward questions.

But, the hit team would have eventually snatched him off the street or from his home.

Even if you were listening to dozens of conversations coming from from Saudi Embassy, would it have been that difficult to tell which one was Khashoggi?

“Damn, this printer is jammed again!” (not him)
“Who drank the last of the coffee and didn’t make more!?” (probably not)
“Did you get the new cover sheet for the TPS reports?” (nope)
“I’m suffocating … Take this bag off my head, I’m claustrophobic!” (that’s him!)

The question is about how you separate out individual voices if your recording has a single track with multiple parallel conversations. The OP apparently envisions surveillance today can just point some fancy equipment at a building and record everything being spoken in there and wonders how they can separate out the individual conversations.

The Saudi team themselves may have had devices that were compromised.
Computers or mobile devices have everything needed to bug a room.
Software exploits could be used to open them right up.
In any case, of course, the particular technique used, as with all sources and methods, would be a closely held secret.