Why the messed up spacing in numbers in downloaded subtitles?

Whenever I download a subtitle for a tv show or movie there’s almost always an
added a space after the first digit in any number sequence (e.g. “1 986”, "v1 2).

I could understand this as some sort of compromise among the different ways languages separate thousands and decimals, but it appears even when no separator is appropriate.

What’s going on here? I can understand many of the oddities that occur with optical character recognition, but not this one.

But that is just it–it is an OCR issue. There is no purpose, it is just a mistake of the scanner. Blame the Tesseract people.

Just a guess, but if you are talking about OCR’d text:
many fonts have the numeric digits set to a fixed width (so that columns of numbers look right). This means that narrow digits like 1 take just as much print space as a wide one like 8. And this often confuses OCR programs into thinking there is an extra space character before or after a digit like 1.

Might that be what is happening here?

It could also be a missing dot. In versions for example, you often would get “v 1.2”, not “v12”, so the version with the space is a missing dot.

Some operating systems do not care for commas or other punctuation marks at all and some assign specific purposes to periods or other punctuation marks.

That’s the most likely explanation, though I’m puzzled why a media publisher would use a fixed width font in subtitles (very rarely more than 2 lines showing at a time).

No, I’m listening to the english audio and they definitely said “vee twelve”. Also, it was in an episode of Top Gear, so V12 is very likely while mentioning version 1.2 of something is almost inconceivable.

This happens when people rip the subtitles from DVDs and convert them to a text-based format like SRT. The DVD subtitles are really just a series of images, so OCR is needed to convert them to text; this often results in the errors you’re describing.

There is an excellent program called Subtitle Edit that will clean up these kinds of errors.