Inspired by the interesting thread on closed captioning, why are movie subtitles often filled with obvious, simple errors? I don’t mean dubious translations so much as grammatical and spelling mistakes and odd constructions that a native English speaker would never make. None of the reasons for closed caption mistakes holds–the filmmakers have more time and are making a product that is expected to last. So how come? And if you have film industry connections and know people looking for proofreaders, feel free to connect us. And having said that, no doubt my spatulate fingers have made some great mistakes in this message. :rolleyes:
I’m not sure what kind of bad subtitles you mean, exactly.
But I’ve also been wondering about this. There was a movie I watched a while back and, while 90% or more of the subtitles were excellent, there were some amazingly bizarre artefacts. For instance, every single time the word “the” was spoken, the subtitle would read “tjc” instead, and frequently the letter i would be replaced by a comma.
How does this happen?
I think some subtitles are auto-generated. When I had a part-time job at Durham College, one of the things we did was upload promotional videos to YouTube. I believe we had the option of auto-generating the transcript of the video, which could then be used as subtitles.
But we did a lot of that by hand because the auto-transcription was not particularly accurate at the time. That was about 4 years ago.
On the other hand I helped out on the English subtitles for the movie “Gerda Malaperis”, translating them out of the original Esperanto. No auto transcription involved. I can easily imagine a typo getting into the user dictionary for the spell-checker when doing that kind of translation.
When was the movie made? Were the subtitles a translation of a different language?
That is almost certainly a bitmapped subtitle that was OCRed and half-assedly spellchecked.
Great info!
I have worked in the industry, and even done some captioning, and there are various ways to get the job done.
In the case of a movie or a scripted TV show there is a script that can be synced to the show. Very rarely, all the actors say exactly, precisely the words in the script.
There can be misspelled words in the script, because scriptwriters are not perfect and proofreading is apparently too expensive.
In a nonscripted show, there is often a transcript. Again, this is typed up either by a machine, which is very imperfect and does not know the difference between words and names, so you will get some strange constructions. Or it could be done by a live person whose grasp of the language is also not perfect. The person said “rain,” or “rein,” or maybe “reign.” Or maybe the person said “binge-eating” and the machine/transcriber heard “bean-jetting.”
In the case of a live sports event or a news show, the captions can be, and used to be, done by a person typing very fast in order to keep up with the speech. In this case, sometimes you don’t know what the word is supposed to be until you hear the next few words and can put it in context. So you might get a weird sequence where the word is misheard, and the captioner knows within a couple more words it was the wrong word but can’t go back, because you can go only one way on those things. And if it’s a machine transcription the usual rules of machine transcription apply–they don’t know context ever and they don’t track different speakers very well.
Also, think of the inefficiencies of spell check.
Sure, the filmmakers have time to do it right. But maybe they just don’t want to spend the money. Maybe they think it’s good enough.
You want some strange subtitles, put on closed captioning in YouTube. I think this is all machine generated, and watching a few of these should set your mind at ease that the filmmakers and TV producers are taking the time to do it right, or at least better.
For really bizarre subtitles look at some renditions in Thailand NOT produced by the original studio. The creator obviously never understand the original English, e.g. writing the Thai equivalent of 'write" where ‘right’ was intended. Sometimes an English subtitle option is offered (for an English-language movie) where the English was obviously just translated back from the already very wrong Thai!
One detective film had subtitles where no attempt whatsoever was made to listen to the voice or understand the plot, just randomly flashing snippets that seemed detective-like (“Where were you last night?”).
I wonder if this is the only instance of a confused algorithm inventing the official name of a TV character?
I knew a subtitler back in the 70s, and even then there was some kind of fairly quick system that would impose key-board entered titles. She had to listen to the English or French on earphones, and type in Arabic, her native language. It took about 60-90 minutes to subtitle a 30-minute TV episode. She was a gifted linguist, and spoke English with fluency and nuance.
I never saw the machine in operation, so I will not speculate about the technology of it, although it used 3-inch videotape, B&W . But given that some days she had two hours of programming to subtitle, simple fatigue of underpaid workers might be part of the answer.
A lot of movies and TV shows are subtitled into hundreds of languages
For the rest of us who don’t know which thread you’re referring to, I bet it might be this one: http://boards.straightdope.com/sdmb/showthread.php?t=845043
I have been watching closed captioning for some time, and for the most part it is well done. The best versions are very nice, with action indicated as well as dialogue (tires screeching) (a thud is heard off-camera) (sad music plays). They even indicate who is speaking if they are off to the side (Bond: That’s a Smith and Wesson and you’ve had your six). I also see the musical score named (Junk Diva’s “I Wanna Be Your Dog” continues).
The one TV show that does aggravate me is Anthony Bourdain’s show on CNN. The captions are delayed by about 5 seconds. If some dialogue is hard to understand, you have to wait until the captions catch up. Not only that, they pause and then burst. Often words are skipped when the paused captions start up again.
I don’t know if the CNN channel does this all the time, as this show is the only one I watch on it.
Dennis
If you get the choice between “English” and something such as “English (HHD)”, choose the second one: it will have that extra information. It is also much more likely to be post-processed (written after the movie was otherwise finished), so it matches actual speech etc. better than when the original script is used.
Spanish subtitling norm UNE 153010 (now used as well by BBC in some of their programs, and maybe, hopefully, spreading around) indicates that colors should be used to identify characters. That way if you have one of those scenes where a few people keep talking over each other, or the camera doesn’t show clearly who’s speaking, etc., it’s easy to keep track without needing space for names. If the subtitling format used makes it possible to change colors mid-line (most don’t; some don’t even allow colors), it’s even possible to show the verbal pileup more clearly by not changing lines for each speaker.
Something I’ve noted on television captioning; I’ll watch Have Gun-Will Travel on MeTV with closed captioning on due to a lifetime of noisy factory environments.
Some of the sound effects are captioned as well, but seem to be autocensored for television. I have seen on several occasions the sound of a “gun xxxxing”. It took a few seconds for me to figure out that it was a “gun cocking” but cock was considered unsuitable for the audience.
Parts Unknown, or at least the version that airs on Netflix, obviously uses a human who doesn’t give a crap. Normal words are spot on, but any ethnic foods or places are horribly mangled. One I remember is that meze was spelled “mezay” though that’s not the only example.
South Park on Hulu seems to be automated. The ends of sentences get frequently cut off. Also they sometimes get ahead of the video by about 30 seconds, but that’s probably just Hulu doing its usual sucking.
Sent from my VS988 using Tapatalk
Yes, i (lowercase only) being read as a comma is a fairly common artifact of poor OCR’ing.
I worked in the biz from the early 70’s to the late 80’s. I’m pretty familiar with that era’s technology. As far as I know there was never any 3-inch wide videotape format. Back in the day most broadcasters used 2-inch video tape aka Quad.
My guess is that they played back the 2" tape of the original program. Your friend listened to the audio from that playback and translated and typed what she heard. She was probably typing on a thing called a “Character Generator”, also called a “CG”, “Chryon” or “Vidifont”.
Your linguistically gifted friend’s efforts were recorded onto a second 2" video tape. Understand that both the tape she was listening to and the tape her typing was being recorded on are both going at the same time. One is playing back, one is recording and both are synchronized to each other.
If she made a mistake in her translation they’d just stop both tapes, go back and fix the error and then restart both tapes from where they left off. Very quick and easy to do, so it’s almost real-time.
Then, they’d take the first 2-inch tape and the second and play them back simultaneously and in sync, making a composite recording onto a third 2-inch tape. That third tape would what would be shown on the air if the translation was needed.
Very crude but that was about the only way to do it in the 70’s
I’m about half done watching **Chicago **on Amazon Prime right now, with the captions on, and the captions are censored even though the actors are using swear words. It’s bizarre.
Thanks, Mortimus… Sorry about my 3-for-2 typo.
In those days, in broadcasting, there was a lot of crude workarounds. Talk-show delay on the radio was accomplished with two tape recorders, and a continuous loop tape, that got recorded, then run six feet across the room to another recorder for playback. The operator could bleep by just potting down the playback eight seconds later.
I’ve seen something similar on Perry Mason captioning on MeTV as well. The frequently-used word “suspicious” comes out in captioning as “suxxxxious” because the auto-censor is seeing the forbidden four letters “spic” and not even checking that they’re in the middle of a word. :smack:
I presume the nostalgia channels (MeTV, H&I, etc.) tend to use auto-captioning with a rudimentary censor because (1) the old shows predate closed-captioning, and (2) the format is relatively cheap. Nobody is going to mistake MeTV for one of the Big Four networks. I presume their newer shows – the post-TOS Star Treks, etc. – from the closed-captioning era are done properly from when the networks originally aired them.