Anyone seen/tried the YouTube caption function?

I just ran across a new YouTube feature that’s pretty amazing if you think of the technology behind it. You can upload a video and a text file that is a transcript of the video, and YouTube will process both and link them so the text displays along with the video when played. Here’s an example I uploaded:

I think you have to click on the caption button after the video starts.

It doesn’t make a caption crawl like on TV, but displays the text below the video window and bolds the part that matches in real time.

You can upload a “caption” file with time tags, or a simple text file, and Youtube figures out where the text matches the speech in the video.

My text caption file was not a perfect transcript, but slightly edited to remove speech stutters and junk, but it didn’t cause any problems. Neither did the speaker’s names that preceded some lines.

One oddity, at least in my trial, is that most “the” words got translated to “xe” or sometimes “the xe”.

The next step would be for YouTube to create a text transcript from the video alone. Now that would be great for government meetings, so you could search for key words, find the text, then find the related video. I’m sure that will come in time.

Pretty amazing!

Discussed tangentially here.

Yeah, but that’s automatic, voice-recognition software, the implementation of the future, to be sure. But my example used an existing text file and the interactive link was done (rather well, I think) by YouTube.

I leave it on all the time, but it works with only a very small percentage. I use subtitles with everything, and wish real life had them.

I’ve tried it a few times and the results were hilarious, like machine translation from English into Urdu or Welsh and back into English.

Did you (or someone) type everything on the screen? I’m just curious if some of the things showing up are artifacts or typos? For example, there are quotes that are preceded by “XE” and I’m not sure why.

Other then that, it seems that YT just needs to work on the timing. They tend to disappear before I have time to finish it. But I have no idea if that’s a setting you can adjust or a different way you can transcribe the file or what. Also, I really only watched a few minutes of some random middle part of it.

I feel the last three posts have been missing the main point.

The text file I uploaded already existed as a transcript. Youtube linked the text to the video, with a few oddities (I don’t know why “the” became “xe” so often). There were no typos because the original file was spellchecked and proofed several times.

So this kind of caption does NOT rely upon voice recognition software to write the text, just to determine where the links should be. Except for the oddities noted above, YouTube did NOT write the text.

Voice recognition to provide text from audio only is another situation, and can hardly be as reliable at present technology.