What's up with closed captioning?

…The gym I go to has a bank of TVs in front of the cardio machines. They are all on different channels, and the closed captioning is turned on.

The thing is, sometimes (especially on the news), some gibberish-like stuff flows through, either misspellings or words substituted for other words. (I can’t think of any specific examples, but it would be something like, the word “historic” reading as “hip storage”. There is not much room in my mind for the concept that a service for people who have hearing problems might be this sloppy. It actually kinda bugs me. I hate thinking about it, and I sometimes hate the fact that my mind is wired in such a way that this will bother me until I find out why these frequent, horrible mistakes continue.

Do voice-recognition programs type that stuff up now, or is there someone in a big hurry during live broadcasts who may not be listening very well? (The former is easier to believe than the latter… I can’t imagine misspelling and misrepresenting ths spoken word so blatantly, no matter how much of a hurry I was in.) Anybody got the straight dope on this?

My Staff Report: How does closed captioning work?

This is why I love the Straight Dope.

Where else can I get answers to questions I’ve long had, but never would have thought or dared to ask?

Now if someone could please explain the size 4 pink panties that appeared in my glove compartment this weekend.

I think someone did a staff report on that once… I’ll see if i can find the link…

:smack: Should’ve checked the SD home page… but wow. I’m mesmerized by the answer… can’t believe this is actually done by extremely fast typists being fed one syllable at a time. That does explain a lot.

Okay, now I’m off to the gym to stare at the badly misrepresented speech once again.

Peace,

ggurl

It’s not typing – you’re hitting multiple keys at once, so depending on the word, you can produce a whole word in one or two strokes. There are all sorts of shortcuts and stuff to get the really big words down fast as well. It’s WAAAAY faster than typing once you get good at it.

This is a modern stenograph keyboard, used by court reporters and closed-caption transcribers. Good stenographers can typically transcribe at speeds of 200 WPM or more, whereas even a highly proficient typist will max out at about 70 WPM or so, and 40-50 WPM is typical.

My sister does this for a living. She started several months ago, and only gets to do shows that are already pre-taped (because of her less-than-blinding typing speed). But because of the volume of the shows they close-caption, she doesn’t have as much time as you infer. Rewinding is frowned upon, although they can stop the show to catch-up or to research the most-likely spelling of an odd name or place.

Thanks for that info, Sean. I’d like to update my Staff Report for the sake of accuracy, so if it’s not too much trouble, could you have your sister drop me an email at her convenience with some more detailed information?

I can only ask her.

A court reporter turned captioner (who mostly works on pre-recorded TV programs and movies, but she’s apparently fast enough to do live captioning) explained how this works to me – it’s very interesting, and helped me understand where all the minor errors in captioned programs come from. Here is a site that explains the basics – the keyboard has keys for all the vowels and the more common consonants. Other letters and punctuation marks are produced by pressing more than one key at the same time (unfortunately I can’t find a reference that shows all these strokes). The words are transcribed one syllable at a time. The captioner strikes a vowel key and one or more consonant keys in a single stroke to produce a syllable. In court reporting, the syllable is printed on a paper tape; a captioner is more likely to have their keyboard connected to a computer. They have software that formats the text they enter, allows them to make corrections, and automatically expands their abbreviations.

Common words and phrases are assigned shorthand abbreviations, such as U for ‘you’. A captioner working on a particular type of program will quickly develop abbreviations for common phrases, as will a court reporter. For example, a court reporter might have an abbreviation for ‘Ladies and gentlemen of the jury’ or ‘Your Honor’, and someone captioning the news would have an abbreviation for ‘George W. Bush’. Sometimes, particularly during live broadcasts, you’ll notice that the captions are usually in all capitals, but some phrases – say ‘Prime Minister Tony Blair’ – might be in mixed case; that’s because it was an abbreviation that was expanded automatically by the captioner’s software.

Other errors may arise because a syllable was entered incorrectly, but the most common error seems to be when the wrong abbreviation is entered by mistake. Errors are more common in live broadcasts such as the news. But I’ve noticed that some news broadcasts – local news in particular – are actually not captioned live, but are captioned before the broadcast. Captions are supposed to represent exactly what’s being said on-screen, but sometimes during a news broadcast you’ll see captions calling for filler (‘adlibs’ IIRC) to be inserted by the anchors, rather than the spontaneous comments the anchors actually make. This provides a great insight into how news broadcasts work, but unfortunately it doesn’t help people who can’t hear the comment. Keeping in mind that words are entered one syllable at a time and that common words and phrases are expanded automatically from abbreviations, it’s fairly easy to understand how the errors you see in captioned broadcasts arise.

There’s other things besides dialog that captioners have to insert, such as sound effects like ‘[loud bang]’ or ‘[police siren]’, musical note symbols, and the chevrons (>>) that appear when the speaker changes. (News broadcasts use >>> for a new story, but this is falling out of use.) Modern closed captioning is also capable of displaying text in mixed case, in italics, and even in different colors. (Different colors are probably most often seen in the caption credits at the beginning or end of a program.)

An extensive FAQ on the technical details of closed captioning (not so much about how they’re entered) is here.

I’ve worked at two TV stations. For the local news, captioning almost always originates from the original news script, and is controlled by the Teleprompter operator. What the end user sees is essentially the same thing the news anchor sees, albeit stripped of extraneous information such as pronunciation cues and stage direction.

My first job was as a Teleprompter operator. When I started out, I would roll quickly through the text of news packages, thinking they were of no use to the anchor. I did this until I was gently admonished that I was robbing our hearing-impaired viewership of captions!

Q.E.D.: I’m not sure if the captioner I met has email access either.

theshroud: That’s probably one of the most interesting things I’ve discovered by watching captions. When the anchor makes a mistake, you can see what they were supposed to say, and you can see where the script calls for an ad lib comment. But the captions don’t include everything said in the broadcast, which would be a problem for someone who can’t hear. (OTOH they don’t miss anything really important, just stuff like ‘Nice weather we had today, Jim. Tell us it’s going to be like that all weekend.’)

Another thing: the captions don’t always appear at the same time as the words are spoken. Often you’ll get to see what someone is going to say just before they say it. This is interesting when a politician is making a speech – the captioner knows what they’re going to say ahead of time. But it absolutely ruins comedy if you can hear what’s being said. Jokes appear on screen just before they’re delivered, so, by the time you hear a punchline, you already know what it’s going to be.

Just curious, Q.E.D.this site mentions that 2 characters are “stored per frame.” Does that mean 60 chars/sec is the maximum text speed that can be transmitted? Or is that actually 2 characters per half-frame, since NTSC is interleaved, and that would be 120/sec?

Either seems fast enough. But aren’t other things transmitted sometimes along with the closed captioning in the vertical blanking interval? Or am I confusing that with audio sub-carriers?

I worked for these guys for the past two years, for what it’s worth.

First off, anything that was pre-recorded, we transcribed beforehand and formatted into files that could be “punched” one line at a time during the broadcast. For live news shows with a mix of banter and pre-recorded segments, we’d recieve scripts for the segments, and then a team would work in tandem - one person punching the pre-formatted material, and then switching off to a live captioner to take over for the banter and “live” segments.

The main reason that the vast majority of captioning is done beforehand isn’t just for accuracy, but for monetary concerns as well. Most live captioners, because of their “court reporter”/ability to use the stenography machines (as seen above), are paid 5 or 6 times what the “regular” captioners are. Ergo, it’s much cheaper to shift the bulk of the workload onto the non-“realtimers.”

Contrary to what the poster above said, rewinding, researching, and double-checking for accuracy was never frowned upon. In fact, as someone who types 90+ wpm, I was often chided and told to slow down and “take my time” with the material to be extra sure of accuracy. In fact, for shows with tons of continuity (like soap operas or evening dramas), we’d regularly spend at least 15-30 mins. double-checking for “historical” accuracy, name spellings, etc.

The reason that you’ll often see “hip storage” instead of “historic” go out (for example), is that the realtime captioner is using a stenography machine (seen above) instead of a keyboard. They work phonetically - the user presses two or three buttons at a time, and the machine sends the commands to software, which converts them into the “most probable” word. So, the slightest bit of pressure or one finger dropping slightly slower than the other can lead to conversion mistakes, sometimes hilarious or even offensive - one time, “she hit” was turned into (you guessed it) “shit.”

If you just see gibberish on the captioning, it’s because the TV’s caption decoder is either outdated or set to the wrong protocol for the channel’s captions.

Signal strength can also have an effect on closed captioning. Back when I had rabbit ears on my TV, the captions would hardly work, but when I got cable, the captions worked just fine.

I think I found the answer to my data rate question here.

But I think that statement must be garbled. A field is one-half of an interleaved frame. If each field can contain 2 chars, and each frame contains 2 fields, and each frame is sent 30 times a second, that gives 2 * 2 * 30 = 120 per sec. So something is wrong with the mathematics as I see it. My guess is that each field only transmits one character and the true data rate is 60 chars/sec.

That link has additional interesting info about the nuts & bolts of closed captioning, if anyone cares.

Please forgive my hijack – if it is indeed a hijack, as Q.E.D. might want to use this for his update – but the data rate that I may seem so obsessed with is actually more complicated than I thought:

from this url. So, if one field is 480 bps, and the data is 7+1 bit ASCI (data + parity), then the throughput for one “data stream” is 60 chars/sec, or 2 chars/sec per field. But two “data streams” may be sent simultaneously without interference (diff languages, for example), which could effectively double the raw data rate.