Why the messed up spacing in numbers in downloaded subtitles?

dstarfire · August 5, 2015, 3:01am

Whenever I download a subtitle for a tv show or movie there’s almost always an
added a space after the first digit in any number sequence (e.g. “1 986”, "v1 2).

I could understand this as some sort of compromise among the different ways languages separate thousands and decimals, but it appears even when no separator is appropriate.

What’s going on here? I can understand many of the oddities that occur with optical character recognition, but not this one.

JKilez · August 5, 2015, 4:10am

But that is just it–it is an OCR issue. There is no purpose, it is just a mistake of the scanner. Blame the Tesseract people.

Tim_T-Bonham.net · August 5, 2015, 4:17am

Just a guess, but if you are talking about OCR’d text:
many fonts have the numeric digits set to a fixed width (so that columns of numbers look right). This means that narrow digits like 1 take just as much print space as a wide one like 8. And this often confuses OCR programs into thinking there is an extra space character before or after a digit like 1.

Might that be what is happening here?

Nava · August 5, 2015, 5:29am

It could also be a missing dot. In versions for example, you often would get “v 1.2”, not “v12”, so the version with the space is a missing dot.

Ranger_Jeff · August 5, 2015, 6:19am

Some operating systems do not care for commas or other punctuation marks at all and some assign specific purposes to periods or other punctuation marks.

dstarfire · August 7, 2015, 2:09am

That’s the most likely explanation, though I’m puzzled why a media publisher would use a fixed width font in subtitles (very rarely more than 2 lines showing at a time).

No, I’m listening to the english audio and they definitely said “vee twelve”. Also, it was in an episode of Top Gear, so V12 is very likely while mentioning version 1.2 of something is almost inconceivable.

erysichthon · August 7, 2015, 8:52am

This happens when people rip the subtitles from DVDs and convert them to a text-based format like SRT. The DVD subtitles are really just a series of images, so OCR is needed to convert them to text; this often results in the errors you’re describing.

There is an excellent program called Subtitle Edit that will clean up these kinds of errors.

Topic		Replies	Views
Please help the Mac guy with a baffling Windows font problem Factual Questions	12	2283	February 25, 2014
When did we stop putting commas in large numbers? Factual Questions	35	10167	August 10, 2011
PDF's, annoying extra spaces between letters? Factual Questions	13	9762	February 22, 2009
Bad subtitles...why? Factual Questions	20	1671	January 2, 2018
Spacing after periods Factual Questions	32	1934	December 20, 2000

Why the messed up spacing in numbers in downloaded subtitles?

Related topics