Usually with pre-recorded dramas, comedies, films and documentaries you work from a script. Occasionally one isn’t available, but that’s really only with first broadcasts for major shows (where they just don’t send the script out to anyone) or things that weren’t scripted, like reality shows or home improvement shows, not older films.
Songs will usually not be included in the script, just the title of the song, so you look it up online; I guess in this case the person who originally posted the lyrics online misheard and thought hey, he was given gold when he was born, so jewelry makes sense. I’d expect someone being paid to do something, like the closed captioner, to check a bit more thoroughly that someone posting lyrics online for fun, so the problem was proofreading laziness leading to an error.
To answer a couple of other questions: Closed captioning has a technically-defined character limit for space because you don’t want the writing taking up the entire screen; in the UK it’s 37 characters per line, 3 lines total, with two lines by far preferred, and this includes any character added to make different colours show up for different speakers. There is also a reading speed limit for comprehension, because we hear faster than we can read; the limit depends on the broadcast channel, but there is always a limit.
These two together mean that sometimes you just have to change the words or they’d be too long to fit or be readable. Hence “we’re going to” is often changed to “we will” even though there is a slight difference in meaning.
Dialogue, especially when multiple people are talking, is the most likely area where there are going to be these problems, so it’s true that dialogue is more likely to be truncated. Narration, on the other hand, is generally spoken at a speed that’s easy to fit in and is very unlikely to be truncated.
Medical dramas can often be quite hard to make subtitles/closed captions for. The medical terminology is important to get right, but those words are really long. That means everything else has to be truncated more.
Live captioning isn’t text-to-speech (in the UK, anyway) but involves the use of shortcuts on special software plus normal typing, kinda like a courthouse stenographer used to use but without the possibility of going back to check you got it right. Text-to-speech software might work better one day but at the moment it’s not very good at dealing with ambient noise, different accents and multiple people talking at once.
Finally, closed captions don’t stay with the show/film, IME. Captioning a film requires different software to captioning for TV broadcast and different regions require different captioning software too - PAL and NTSC have different frame rates, for example, which means you can use the whole original PAL/NTSC script, but you have to change all the timings, which also means changing some of, or sometimes quite a lot of, the wording, to fit in with the timings. If you simply tried to use an NTSC script on a PAL broadcast it would go out of time within about three minutes.
TV broadcasts also require small changes to account for commercial breaks; DVDs don’t. Occasionally you’ll see where a station has paid too little for their subtitles and they overun the ad break and just keep on going as if the show were still on.
And if a TV version is even a minute different to the TV version, then, unless that minute is only at the very end, everything after the difference has to be changed, because otherwise all the subtitles will be a minute out of synch. And all you’ll know is that it’s different, not exactly when, so you actually have to redo the whole thing.