I’ve tried looking it up, but all the explanations are a bit too layman for me, only explaining in broad terms what “CGI” is. I’m not a graphics expert, but I’ve done work in both Pixar’s RenderMan and GLSL, and I’m curious at how they superimpose things on live action film. A full 3D movie is fairly straightforward, since it’s just modelling any scene, adding King Kong or whatever seems more complex.
Neglecting Chroma Key work, my guess is that the bulk of the work goes into hand-defining depth, material, and surface normal information at each pixel of the input frame from the camera, as well as the source and color of various light sources. Then they can use displacement shaders to perturb the frame “as if” it were a 3D scene, and do shadow and lighting passes as usual. But that’s just a broad guess and there may be techniques I’m not aware of here. Some of this depth info may be able to be filled in programmatically if you have information from, e.g., a stereo camera.
It’s easy to do the basics - do CGIwhich mimics camera elevation, angle, lighting, ground contours, or rough walls and scenery, etc. The undefined areas of the CGI are transparent/greenscreen. So when you superimpose this on top of the live movie, it appears to be additional action shot from the same camera. Presumably there’s a feature for imposing darkness for things like shadows (where the CGI shadow falls on the transparent ground, darken the real-world ground by the same amount).
For elementary superimposition, i.e. actor’s arm in front of CGI dinosaur, It would be trivial (compared to the overall effort) to create a CGI arm that is green-screen-coloured and move it using the real-world frame as a guide, and so when superimposed the real-world arm appears in front of the CGI element.
There’s also the trick of taking the picture of the person, creating a rough CGI 3D model of the person, and painting it with the picture of the person. then you can animate this picture, have it fall with flailing arms, hit a ship propeller, etc. As long as it’s moving fast and/or somewhat small onscreen, nobody will be able to critically examine it the way you could, for example, with Gollum squatting on a rock barely moving in full frame.
Of course, the difficulty with “getting it right” is how much effort and detail the director wants. Some directors are incredibly picky on details. For example, the shadow effect… You have a CGI dinosaur or robot, it has the same sunlight lighting as the live action, it produces a shadow so the live-action ground is darkened by the same amount as the CGI dictates. (transparent darkening). However, as the shadow moves across the piles of rubble or dead bodies on the ground, shadowing won’t be simple and “flat” because the ground is no longer flat. To compensate for this, the CGI guy has to model the rubble piles and body positions as extra scenery (green-screen coloured). Obviously, this is easier to do with preplanning, either scanning or using some form of measurements and photos while the live-action scene was being shot.
…or in Hollywood tradition, some directors could care less and assume the audience will not notice severe green-screen screw-ups.
You’re asking quite a lot for a non-layman explanation. Can you narrow down your question a bit?
The first step is to figure out where your virtual camera goes. There are two basic ways of doing this: either you actually keep track of where your physical camera is and replicate that in your 3D renderer (requires expensive hardware), or you use image-based techniques to figure out the camera motion. Lots of programs support this latter technique–you need to identify only a few static points in the scene to determine the camera motion. After Effects does this easily, for instance.
As you mention, getting the lighting information right is important. Again, there are two basic techniques: you can either record it from the scene itself (using a 360-degree camera, for instance), or you can estimate it based on the video. This stuff is hard to get right and the eye can pick up very subtle flaws.
As far as I know, there’s still no magic way of determining occluders from video. There are of course various techniques out there like depth from stereo and depth cameras, but my understanding is that the process is still mostly a manual process of identifying edges (people, walls, pillars, etc.) and creating matte polygons from them. From there it’s relatively trivial to composite in the CG stuff. All programs can generate full depth images as part of the render, so on that end of things it’s easy. They can also generate transparency mattes.
The original Star Wars was done with film trickery in the days before CGI - matte paintings, masking and overlays, etc.
However, George Lucas’ main advance was to track camera positions and angles with a minicomputer (frame by frame); that allowed him to do coherent composite shots - fighters, flying through trenches, explosions, etc - without having to use static camera positions. the camera could follow a fighter model swooping down (into the trench), add it to “enter the trench” model shots,from behind and not have to worry that the angles or movement looked “wrong”; instead of having to position trench and multiple fighter models together in one camera exposure.
I suppose I’m more specifically curious about how things like transparency, lighting, shading/occlusion, and reflection work. I can put together relatively simple (if time-consuming/expensive) ways to do occlusion – even easier if you use the “green arm” idea md2000 mentioned.
The general problem I’m seeing is artificial lighting sources – CG fire, a CG glowing orb, a creature with glowing eyes – and shadows from CG spaceships, vehicles, or creatures projected onto filmed actors and terrain. Some of the modern effects look really good, and I’m not sure exactly how they do it, especially since it’s notoriously hard to model light on skin since skin is semi-translucent.
Like I said, the best I can think of is mocking-up a low-fidelity version of the scene in a 3D animation program, adding your creature, and then (somehow) mixing shading information from the result and the original frame.
Right; before the age of fast computers, that was the only way of doing things. Even without CG, sophisticated compositing requires a way of tracking/replicating camera motions.
It’s much easier these days; lots of videos on YouTube demonstrate that a high schooler with After Effects can get equally impressive results (in this particular respect). The rest of the pipeline is still difficult, though also getting easier. For instance, you can get pretty nice light maps with one of those reflective garden balls and a digital camera.
For lighting, I don’t think it’s really done in quite the way you’re describing. You can of course go all the way to fully virtual actors, which requires high-fidelity 3D scans of the actors and sophisticated shaders for the skin, etc.
But to take video and add these lighting effects afterwards–not really. What’s done instead is have artificial lights to simulate what the CG is going to be like. For instance, say you’re making a lightsaber. You would have the actors hold a real glowing stick with LEDs or something, so that if they hold it next to their face they get the natural green glow reflected off their skin. Then, when you add the CG effects, it looks like they are producing that same glow. Adding the reflection in later would be vastly more difficult.
Shadows are a hard one, though. You pretty much have to have a 3D model of your scene. Ideally you keep the shadows limited to flat planes so that they’re easier to track.
To add to the above. A huge amount of greenscreen and hand animating depth mattes (rotoscoping) is used to put all the layers of live action / 3D and matte painted backgrounds together. There are cameras that can shoot depth information but they’re not widely used because they slow down shot setup = more money on the shoot.
Its far cheaper to pay people to rotoscope masks by hand when needed (and they’re needed a lot) than pay the expensive actors / film crew etc for extra days on set.
Various websites like FxGuide.com have behind the scenes breakdown articles on how VFX heavy movies sequences and scenes were done.
The point was that most camera tricks and model work were done with static cameras - for simplicity. Most of 2001, probably the most sophisticated such movie before this, had simple fixed camera angles and straight linear dolly movement past models. George was AFAIK the first to attempt to move cameras during this sort of work, and exploit this to make compound shots. He also IIRC had computer-controlled camera movement, so he could reshoot a model sequence if necessary.
There were several movies (the original Parent Trap comes to mind) where one person played twins, and it was a clever double-exposure trick - expose one half of the film with a vertical element like doorframe in the middle, then expose the other half - it’s like a dance to make the other half match up.
I think the main point is that with a judicious mix of CGI and live, and some quick movement, the eye is often fooled into seeing what it wants.
It was The Starlost (a Canadian series based on an idea stolen from Harlan Ellison, or rather Cordwainer Bird) one of the first TV shows where they first did a lot of greenscreen TV technique to produce cheap scifi scenery - but were very sloppy about matching camera angles, so sometimes characters’ feet appeared to slide as they walked across scenery superimposed.