Whenever I download a subtitle for a tv show or movie there’s almost always an
added a space after the first digit in any number sequence (e.g. “1 986”, "v1 2).
I could understand this as some sort of compromise among the different ways languages separate thousands and decimals, but it appears even when no separator is appropriate.
What’s going on here? I can understand many of the oddities that occur with optical character recognition, but not this one.
Just a guess, but if you are talking about OCR’d text:
many fonts have the numeric digits set to a fixed width (so that columns of numbers look right). This means that narrow digits like 1 take just as much print space as a wide one like 8. And this often confuses OCR programs into thinking there is an extra space character before or after a digit like 1.
Some operating systems do not care for commas or other punctuation marks at all and some assign specific purposes to periods or other punctuation marks.
That’s the most likely explanation, though I’m puzzled why a media publisher would use a fixed width font in subtitles (very rarely more than 2 lines showing at a time).
No, I’m listening to the english audio and they definitely said “vee twelve”. Also, it was in an episode of Top Gear, so V12 is very likely while mentioning version 1.2 of something is almost inconceivable.
This happens when people rip the subtitles from DVDs and convert them to a text-based format like SRT. The DVD subtitles are really just a series of images, so OCR is needed to convert them to text; this often results in the errors you’re describing.
There is an excellent program called Subtitle Edit that will clean up these kinds of errors.