Cartoons: How Do They Do That?

How do they get the dialog (with emphasis) to match the lips and facial movements of the characters so perfectly? …today? …and, yesterday?

Well, the real answer is that they don’t. It’s largely an optical illusion and if you spend a lot of time focusing on just the mouths of the characters they don’t really track as perfectly as you think they do in most cases. In fact one of the hallmarks of cheap animation is that they don’t even try to have the mouths do much more than open and close.

That said, at least in the old days, they would record the vocal track first and then draw the mouths according to the track. I have seen key sheets that reference what shape the mouth should be when it is matched to a particular sound. I don’t know how common those were, but they existed for a while at least.

And of course, with old hand drawn cel animation the mouth would be on its own cel sheet, which makes it easier to change out rappidly without having to redraw everything.

Watch old Bullwinkle cartoons. When Bullwinkle is talking, only his lower jaw is animated, being on a separate cel from the rest of him, and the color usually doesn’t match the rest of his face.

That’s still how it’s done, at least for high-end western animation (Disney, Pixar, etc.) The entire movie is worked out in storyboards so they know the rough timing of all the scenes. Then they record dialog. Then they animate to that dialog.

animator’s that have good wrists.

Good to know. My knowledge of how things are done ends at around 2003 and I didn’t want to make assumptions that things haven’t changed a lot in the last 10 years.

Clutch Cargo got the lips absolutely right. :slight_smile:

They know how to do it because it’s what they do for a living.

Oh god, I’d forgotten those. They freaked me out even as a kid.

The big breakthrough was when they stopped trying to record before a live studio audience.

It depends where the animation is produced, and how much quality is desired. Traditionally in the US the artwork was done after the sound track was recorded in order to match lip movements to the sound, dancing to the music, etc. This is the way it’s done in live action films when the sound has to be recorded later. Animation produced elsewhere traditionally has been done with the animation first, and all sound including voices added later. For one reason, the largest markets for the animation from other countries has been primarily overseas, and the animation will be dubbed in more than one language anyway. Over time a variety of animation techniques in the US were developed to save money, usually by eliminating the more costly phase of recording sound to match the action. Even when the voice is added following the animation, it would be done with very few takes, and the voice actors may not even see the artwork. Tom and Jerry introduced early cost saving measures by creating one large background mural for the entire episode, and then eliminating dialogue altogether. Animation directors are acutely aware of the added cost of synchronized sound and in quality productions will limit the amount of action that has to be precisely matched if money is tight.

Every show I’ve worked on has recorded the dialog first and then animated. In each character’s model sheets, there is one page devoted to mouth shapes for specific sounds - these model sheets are used as reference. In the unusual occasion that dialog has to be recorded or fixed after the fact, we would have clips of the scene for the actor to watch as he recorded so that he could match “mouth flaps.”

I worked with a very talented voice director who wrote dialog to dub a number of Japanese films into English and he took great pains to match the number of syllables and mouth flaps. It was never just a language translation.

Wait, Pixar and that do all the animation after the sound, or just the voice animation? It seems to me that for the most efficient pipeline you could animate the broad strokes while things were being recorded, and then add in the mouth movements and tweak the gesticulation timing once the sound comes in. This is less possible with hand-drawn animation (though modern tools would make it easier), but it seems odd to me that they hold off altogether on animation until the sound is finalized.

In CG animation they model in a selection of shapes that they “morph” to from the base shape of the face. Then while following the dialogue they keyframe in each shape for each syllable, in differing strengths and combinations.

I’ve done it a few times, it’s easy to get bogged down in fine detail, trying to get every shape for every syllable accurate and present, but in fact what I’ve come to learn is it’s often better to loosen up and skip a few shapes as long as the broad strokes are there. Most of the time the audience isn’t watching the mouth, they’re looking at the whole head and face, and the body too, and emotional beats trump accurately matched dialogue.

IME, no animation was done before the vocal track. Often the character designers would even attend a few recording session to watch the facial movements, physical gestures, and to get a feel for the character.

That said, there is a TON of other work that happens in pre-production. Every single character gets a model pack with every possible view of the head, body, face, etc. Then add in the mouth chart, then the color palette, then even a “line-up” of all the characters so you can determine their height and proportion relative to each other. Every single prop has a model page with front, back and side views - such as a gun, a backpack, a car, any sort of detailed prop. You’ve also got background design and painting, then there’s storyboarding which is a huge process. We’d have the story boards at the recording session and then every sound of the final track (every um, ooh, grunt, whatever) is transcribed and added frame by frame to the story board. A story board for a 28 minute show could be 300 pages.

This basic framework holds true for traditional hand-drawn ink and paint animation, as well as stop-motion (like claymation) and computer rendered animation. At least it was this way when I was in the industry a long, long time ago! But I am not an artist - I worked mostly with the voice artists, scripts and storyboard/layout organization and overseeing.

For those youngsters who don’t get the joke, here’s some Clutch Cargo

In the videogame cutscenes I’ve worked on, here’s the typical sequence:

  1. Write the script
  2. Storyboard it
  3. Create a silent animatic from the storyboards
  4. Record all the dialog
  5. Create a sound animatic from the storyboards and dialog
  6. Create a rough guide animation from the animatic
  7. Do a round of dialog pick-ups
  8. Fully animate the whole thing

The thing is, recording dialog is FAST. With a professional voice actor, you can get everything you need in a day or two in the recording studio. A single hour of voice work can translate to days or weeks of work for the animators, so there’s not much value in trying to do things in parallel. Instead what you focus on is getting a great performance and minimizing pick-ups. A lot times the actor’s delivery isn’t just used as a guide for how to animate mouth movement, but as a reference for body language, blocking, and so on.

Don’t say that to Roger Rabbit. :slight_smile:

I’d heard on this board, that’s why The Flintstones drew Fred and Barney with constant 5 'o clock shadow. So they could easily animate the moths without worrying about color matching.

Whoa! That’s what young Butch is watching in Pulp Fiction when he gets the gold watch from Captain Koons (Christopher Walken). (link)