How does Zoom (video-conferencing software) insert virtual backgrounds so well?

If you’ve been using Zoom for remote video-conferencing, then you may know that it allows you to insert virtual backgrounds behind you. This is great when working from home and you want to hide your unmade bed or empty beer bottles and fast food wrappers. The interesting thing is that Zoom is able to do this without having a green-screen behind you. There is an option to tell it “I have a green screen behind me,” which I assume enables very high-quality boundary trimming of the video subject, as one would expect during a weather forecast on the evening news - but it’s really amazing how well Zoom does without a green-screen, even with a very cluttered background.

How is Zoom so good at identifying what is background and what is the person in the foreground?

If you saw the results of my attempts to use that feature, you wouldn’t ask the question. I plan on putting up a greenscreen before I attempt it again.

Probably what it’s attempting to do, though, is just to identify and pick out faces, and assuming that anything that’s not a face is background.

I tried it and the background bled into my face …

I would be surprised if it isn’t some sort of black box neural network algorithm trained on lots of video samples. If so, it’s difficult to say exactly what it uses to make the determination, but I expect it’s finding a blob in the center of the screen that moves at least a little and makes a qualified guess at what its edges are.

Possibly relevant to my failure is the fact that my face isn’t centered on the screen. My camera is built into my monitor, and aiming the camera directly at my face would require tilting the screen forward to an uncomfortable degree. So maybe it just looks where it expects the boundary of a face to be, and finds the sharpest edge it can, closest to that expected location.

It’s not just faces, though. It’s getting my entire upper body correctly identified, in spite of a wide range of postures/gestures.

Centering isn’t an issue for me. Even with my face well off to one side, Zoom gets it right.

What about the rest of your body and your hair? I’ve never used it, but I presume you’re not a disembodied head floating around.

If I had to take a WAG, having no experience with it at all, I’d say it may identify faces, since that’s a technology we already have, but to take it a step further, maybe it watches for things moving and assumes that’s part of the foreground. It wouldn’t surprise me if it even had (or eventually gained) the ability to pick out ‘normal’ human movements. This way, it would leave things like the TV or the cat or the trees blowing around outside the window behind the artificial background. Maybe the moving things have to be within some distance (X, Y or Z) of the face.

I’d be curious to see if, when using the background, what happens if something behind you moves. Be it a pet or someone walking behind you or the TV.

More of an aside (because most webcams I would guess don’t use this yet) but here is a great explaination of how modern DSLRs and at least the better phone cameras recognize focus (which can be used to tell foreground from background.) The best modern sensors have that phase detection at every single pixel.

Well, OK, faces and upper bodies. My point is that it might have been programmed to recognize shapes that resemble a face, or a face plus shoulders, in the center or off to one side, but not programmed to recognize the top half of a face, stuck at the bottom of the screen.

I’m not familiar with Zoom, but from my experience that sort of feature would work best with a well-lit subject and a background far away or as uncomplicated as possible. If you’re in a dimly lit room sitting with your back against a bookshelf, it’s not going to work well. If you are well lit in front of a white wall it should work about as well as possible.

It probably uses a similar algorithm to current phones that add a bokeh effect. If the phone knows how to blur the background from your body, it does not seem like a big stretch to simply replace it entirely.

Zoom and Teams aren’t able to easy distinguish faceS. The second or more people rapidly/often/generally get incorporated into the background image. So something in the algorithms is looking for 1 face or perhaps trying to select foreground vs. background so only people equidistant to the lens are counted equally (also an observation that helps with multiple faces).

Of course, it’s also possible that it works well for Machine Elf but not for me because whatever algorithm it uses is computationally expensive, and the relevant hardware on my machine just isn’t quite fast enough to do it properly.

What OS are you running Chronos? From what I’ve experience Zooming from home, it works pretty well on Windows or Mac, but pretty poorly on Linux. I’ve got a pretty powerful Linux laptop, but the face detection is pretty awful. Same with my colleagues.

The computer I’m Zooming from is running Windows 7. I’ve also tried it on my Mac, where it didn’t work well at all, but I think that’s mostly because that machine doesn’t have enough memory.

It doesn’t ask you to take a shot of the room without you in frame?
(because that would be a really easy way to do it without green screen - the foreground is anything that’s different from the reference image

Is it a waste to download this if you don’t know anyone else who has this program?

If you send an invite link to a meeting and the recipient doesn’t have the program they will be prompted to download it.

Whether or not that stops if from being a waste depends on what kind of people you are planning to invite to zoom meetings.

A waste of what? There’s a free version, so not of money. Time? Depends on what else you’re spending your time doing.

But I will say that I originally downloaded it because a group of my friends was using it for gaming, but shortly after that, I had occasion twice to use it for completely unrelated business reasons, and in both cases, my prior experience was helpful with some of the technical aspects. With how many people nowadays are working from home and doing all sorts of things online, it wouldn’t be surprising if, after downloading it, you have use for it after all.

A waste of hard drive, RAM, etc… Like I said, I don’t know anyone I would even chat with. Now if you said there are places to join or to meet people for conversations, cool.

Has Zoom solved it’s security problems yet?