From The Speckerman Recurrence…Jimmy: Okay, here it is. I have this great money-making idea. I just need a gear head to get it to the finish line.
Sheldon: Technically, Howard’s the gear head. Leonard’s just a dime store laser jockey.
Leonard: What’s the idea?
Jimmy: This is just between us, right?
Leonard: Right.
Jimmy: Okay. What do you think about a pair of glasses that makes any movie you want into 3D?
Raj: That sounds amazing. First movie I’m watching, Annie.
Howard: How exactly would these glasses work?
Jimmy: How the hell should I know? That’s why I need a nerd.
OK, first off, Jimmy’s an idiot without any appreciation or understanding of how 3D works. Secondly, I think 3D is overrated anyway. Give me a decent high res 2D image any day. Part of the charm of traditional photography is the use of depth of field and 3D seems to screw that up and makes my head hurt.
But… what if… you could build an AI that could “see” the 2D image the same way a human sees it? Figuring out what’s background and what’s foreground—at 24 frames per second, mind you—and then splitting the image into stereo adding in the appropriate parallax. In theory, why not?
Silly idea, I think and a little useless because by the time we would have that kind of processing power, entertainment will have to moved on to something else, even despite the recent "death" of 3D TV.
Right, but Cameron didn’t have to do it in real time. It’s not a completely outrageous idea. Just replace the human editing component with a sufficiently advanced AI and enough processing power to do it in real time. I would have thought Howard would have at least entertained the concept.
Yeah, I hate laugh tracks, but it is a “live” audience more or less. I would be happier if it was dialed down a few decibels. Still, one of the very few current sit-coms that is capable of making me laugh out loud.
You could totally do it, if it wasn’t for the fact that it’s impossible.
There’s no way to tell the difference between a turning head of a human, from a moving car in the background, or a mountain far off in the distance, even if there’s parallax scrolling. Plus most shots are done with a static camera, and that makes it even harder. It just can’t happen.
Many modern 3D TVs have a 2D-to-3D upconversion. I have never seen it, nor do I know how it works, but judging by reviews on Amazon it doesn’t look like complete crap.
That is where the AI comes in. If a human can do it, a sufficiently capable machine can do it. Sift out that moving car in the background vs. a turning head in the foreground, just like a human does. I’m not saying it would be easy. That’s why we need a nerd.
But for what practical purpose? Personally I find 3D either boring or distracting and sometimes both at the same time.
I think it might be doable in principle. An AI with object recognition capabilities could tell the difference between a car and a mountain and know one should be in the middle distance (and estimate its actual distance from its screen size and a knowledge of the typical size of cars) and the other in the far distance. AI object recognition, especially in 2D images, is a difficult problem, and certainly has not been fully solved, but it is not nowhere, either, and might well get good enough for this sort of job.
As standingwave says, though, it would not really be worth it.
My TV does it. It seems to be “humans foreground, everything else background”. Looks OK most of the time, but occasionally it does something egregiously and humorously wrong.
I’d bet it’s delaying the picture by a few frames to do the processing - I’ll test that out tonight.
It doesn’t really make things truly 3-D, but that was the claim of people in the 18th and early 19th century using something called the zograscope, or Perspective Machine. I wrote an article about it las year, having seen one in the Boston Museum of Fine Arts and another at the Peabody-Essex Museum in Salem. I researched it, and finally built my own. These devices were special viewing cabinets (or at least frameworks holding the optics in position) that let you view a large engraving or painting of a scene through a very large lens, with the picture placed at the focal point. Of course, the information needed for true 3D wasn’t there – it was a view from a single viewpoint, of course, but I think the odd appearance of it helped trick people into thinking they were seeing 3D (there have been psychology articles trying to explain the phenomenon). A big part of it is the distortion the lens introduces, which makes the scene appear sort of as if it’s on the surface of a large globe that seems to rotate slightly as you shift your viewpoint (the image IS 3D to that extent – your eyes, looking from slightly different angles, see the scene at different depths, so it really DOES look like it’s on the surface of a sphere. But not as if the elements of the scene are in what we would think of as a true 3D interpretation of the scene).
It’s actually surprisingly easier to do a half acceptable job than you think. Video compression already computes motion vectors for a scene so all you have to do is pull out the motion vectors and foreground things with high motion.
I had a friend hack up a prototype when we were in school and the results were not half bad. They worked best with long panning shots of a static image. Of course, it’s the last 10% that kills you. It was fun as a tech demo but even a single inconsistency every minute or so would take you out of a movie and it’s getting to 99.9% that is the difficult part.
It’s so easy that YouTube has it as an option, although you have to expressly allow it when you upload. It works pretty well in the stuff I’ve seen it in (some talk shows, and, well, the seedier side of YouTube).
I think all future digital cameras should record a depth map track, which could be used to convert footage to 3D with ease. It would also make a lot of VFX technology easier, as there’d be no need for a greenscreen anymore.
Surely IRL the only function of actual 3d using our eyes when we’re out and about, is that when we move, a slightly different part of what we’re looking at is revealed .
That and an estimation of distance of whatever we’re looking at.
We don’t tend to see things springing out of the background as though they’re coming into our face.
So while its an interesting visual effect, its not actually realistic, and unless we’re moving around instead of sitting in one place in front of the screen, its not really serving any realistic function.
I’ve had a TV for almost a year now that can do this. It’s kind of funny that some of you aren’t aware of it.
Yes, you can convert 2D to 3D in real time, and the results aren’t even that bad. I watched an entire movie on Blu-Ray that was in 2D (the 2nd sherlock holmes movie with RDJ). The effect was pretty good. Not as good as a movie that’s actually filmed in 3D (like Avatar), but about as good as those that are later “converted” to 3D via post-processing (but with some mistakes as opposed to none).
I have even played video games using the 2D to 3D setting (like minecraft and assassin’s creed), and it’s kind of cool. The effect is far from perfect, but I mean… imagine what we’ll have in the year 2020?
The easiest thing to do is to close or block one eye (while wearing the glasses).
If you want to be able to see with both eyes, you can pop the left lens out of one pair and substitute it for the right eye lens of the one you’re wearing (or vice-versa)*. That will let each eye see the same image and convert it to 2D. It also ought to eliminate the strain some people feel trying to reconcicle the two stereoscopic images.
You can’t just watch the screen without glasses, because then you’ll see both images simultaneously, which looks fuzzy and weird.
*assuming that you can do so, and that the glasses aren’t formed to prevent this. Be sure to keep the same surface forward – the newer circular 3D like Real3D won’t work if you reverse the lenses.
A camera-recorded depth map would certainly help tremendously, but it wouldn’t be a complete solution to the problem. You’d still have portions of the scene that were hidden to the camera but revealed to the offset eye, and you’d have to somehow interpolate what should go in those hidden portions.