Facebook Video Checking

When I submitted a video on Facebook of my visit to the Oklahoma Aquarium I had to wait a minute or two before it was released for viewing. Since I don’t think someone is actually looking at my video to make sure it’s appropriate there must be a software application that can make that determination. I think YouTube must have the same kind of program too.

So how does the software decide whether something is inappropriate or not?

I assume that inappropriate subjects would be those that are sexual in nature, showing graphic violence such as a beheading, or something showing you how to make a bomb or kill someone…

Or it’s checking for copyright violation.

Right… but that seems like an even tougher thing to check for in a minute or two.

They use complex pattern matching software, but it’s designed to be run rather quickly. If it wasn’t no one could ever post anything to Youtube. It’s not the best that it could be, but given the time constraints that’s what happens. I believe they continue to run some checks after the video is posted and they can be pulled after they’ve gone live.

So it takes a random frame (or frames) from my video and does pattern matching to see if it’s appropriate in a few seconds? That’s some amazing technology. I tried to Google it but couldn’t find an example showing how it’s done. It may be proprietary to Facebook/YouTube and they don’t want people to thwart their system.

It’s more complex than that, typically creating a sort of digital fingerprint that can be matched without comparing individual frames of video or snippets of audio.

Here’s a Wikipedia page that gives you an overview - Acoustic fingerprint - Wikipedia

It has a database of digital fingerprints of copyrighted material from known sources. On YouTube, you have to apply and send them material to fingerprint. The video on the page has a little breakdown of how it works.

When you upload a video, a fingerprint is created. Then it’s checked against the database of known copyrighted material. Really just like a real fingerprint.

It’s not that “hard” (I mean, relatively - cuz I couldn’t do it!) to do the checking because the number of items you’re checking against is finite. Then I’m sure there’s some algorithms to speed it up - stuff that would quickly eliminate a big chunk of fingerprints from the search. The above-linked YouTube video says it has just 3 million items. That’s not very big in terms of data at all.

They’re probably also automatically transcoding your video. Though that can be done quickly, it’s probably a longer process than checking it for copyright violations.

You’re working with Binary (0 and 1) data, so each bit you process (theoretically*) reduces the number of possible matches by 50%. Some of the earliest developments in computer science were about sorting and/or comparing large sets of data.

Search for morse code trees for an easy example of how this works.

They can also just read a 1 pixel wide strip through the middle of the video. from that strip you have basically a “barcode” It is highly unlikely that any given 30 seconds of video is even marginally close to another unrelated video.