Cato Institute has performed a review of the research that exists and came out in favor of the tech companies:
Professionally, I haven’t worked on this issue, but I have worked as a developer for big tech, know who all would be working on this stuff, and researched all of the technologies in use and have thought a fair amount about this issue. I believe that I can give a fairly factual answer to the question.
One thing to note is that a majority of the people who will have developed the tools for automated moderation are Chinese and Indian. They’re just trying to achieve whatever policy aims have been handed to them by Legal.
Legal will tell them to do something very specific like, “Remove all content that incites violence.”
The way that this will be done is that they’ll create two sets of data, 1) a large body of general content from millions of users and 2) moderated content by human moderators that was flagged as having content that incites violence.
From there, they’ll use various techniques to try and make a programmatic determination of what words, phrases, etc. can be detected that differentiate those posts from the general set and make them be a part of the moderated content.
In essence, the AI (done well) just replicates the thinking of the original human moderators. Neither the Chinese nor Indian developers are going to fiddle with that any more than that. They have no motive to try and target any particular political entity. They’re just doing what Legal asked and trying to flag content that’s probably an incitement to violence. And while the large body of American developers might have a political motive, the way that you create these sorts of systems don’t really allow for personal politics to intervene, nor would anyone. They’re professional coders, not political spies sent in by the DNC. Thinking that they would is silly. As said, all anyone will do is take historical moderation decisions, and use that as a data set for detecting illicit posts.
So the question is: Who are the human moderators and what is their political leaning?
I believe that that answer varies, but I strongly suspect that it the general answer would be that it’s a large body of Indians, Philippinos, and stay-at-home moms in the Midwest.
These are not bastions of American Democratic Party support. And these people are vastly short of the number of people needed, so when they were generating the data that was fed into the AIs to train them, they had to moderate the entire Internet by hand and that’s completely infeasible for the number of people who were hired to perform that moderation.
It is very likely that these moderators, at that point in time, saw purely the content that was patently criminal and moderated only content that was patently criminal. They would have been receiving content that was flagged by other users or by very simple moderation algorithms that looked for very basic things like the word “bomb”. They would not have wanted to have dived into any moderating choices that took mental effort, time, or would be controversial. They have enough on their plate, and there were enough postings by people threatening to murder people that they don’t need to get involved with other content that’s more complex.
We can safely expect that the data which was used to train the bots was generated by non-partisan people and that they really only looked for patently illegal content.
The AI, even though it is doing little more than repeating what the human moderators do, can actually do more than they did because it uses math or outputs a measurable signal. The moderators gave a binary answer, allowed or barred. The AI can and will give a confidence rating, 0.1%, 20%, 76.43%, 99.94%, etc.
The (likely) Indian or Chinese developer will run the AI over a set of billions of posts and have it rate every single post on how illicit the AI believes that post to be. Then that developer will look at some posts that are at 50% confidence of containing nefarious content and decide from the sample, “Does that seem right?” If not, he’ll try 60%, 70%, 80%, and on until he finds some value that produces output that looks sufficiently questionable on average that it’s worth flagging to a moderator, while not producing so much output that the moderators would never be able to double-check it.
That developer will then continue raising the confidence rating until he finds a value (e.g. 95%) where nearly everything that the AI has output contains enough obviously illicit materials that you’re better off to just auto-ban rather than flag it to the human moderators. The poster can always file a review to get their content restored.
That would be stage one of AI moderation. Facebook and other platforms have almost certainly arrived at this point and are or were doing it in this way. There is no room for nor reason to expect bias to be involved.
Stage two of trying to moderate content, which I don’t know if anyone has done, would be to extend this process and look at all of the posts by users who later posted criminal content and analyze that for markers that might indicate a person of that mentality.
Having done this, you can start to rate users on an index of “likelihood to be a person who goes off the deep end”.
This index is a useful one for moderation because the AI is, fundamentally, stupid. The AI is really just looking for keywords and phrases that seem to be related to a desire to murder people. It’s wildly more advanced than simply looking for the literal word “bomb” but it is still, in essence, just doing that. There are lots of ways to create content that will ping the AIs radar while being completely benign.
Benign posts waste the time of the content moderators. You need to hire more of them to process the wheat from the chaff, which costs money, or you need to improve the AI to do a better job at detecting nefarious intent.
But AIs - even ones that use neural nets and stuff that should allow them to approximate human thought processes - can’t replicate what a human content moderator can do because they have no context.
If you raise a human, from birth, in a gravity-free black void, pumping food directly into its stomach, that human can have random phrases flashed through his brain and be rewarded when he squeezes his right hand for some and his left hand for others, and he’ll start picking out some patterns that allow you to start feeding arbitrary phrases through and get a decent output, but that human won’t know what those words actually mean. He won’t know that they are words. He’ll have no concept of what a word is, let alone what it refers to.
We are not taught vision. Our body takes in signals like electromagnetic wave lengths and air pressure waves and turns those signals into an “experience” and builds a way of understanding that signal. It invents the color purple out of detecting both blue and red wavelengths at the same time, despite there being no actual wavelength for purple. Red goes off into infinity in one direction and blue goes off in the other. Between red and blue are orange, yellow, and green. There is no purple between them. As said, the brain invents an imaginary way of interpreting the signals that it receives in a way that allows us to visualize and interact with that information, usefully.
A human brought up in a void, having phrases flashed through his brain, would simply interpret those phrases like we do colors. They indicate something, but they’re just a signal not a meaningful concept, not a language. To turn signals into language, you need to have context. You need someone to point to a chair and say, “That’s a chair.” You need to have experienced gravity and sat in a chair, so that you could appreciate the chair’s purpose and use. This mixture of sensations and experiences allows you to relate you visual and sensory systems together with your memory and create a conceptual object known as a “chair”. Minus all of that, though, living from birth in a giant endless blackness with no gravity, all you’ll interpret “That’s a chair” as is flickering colors. Maybe you’ll recognize some of the signal patterns as ones you have seen before in combination with other signal patterns, but that’s all that it will mean to you.
An AI is at this sort of disadvantage. It has no way of gaining any genuine understanding of what it is absorbing. It’s just looking at very simple patterns and trying to guess whether the patterns match data of type A or data of type B more closely, but in a way that is completely devoid of context. It’s just seeing flashes of color and chucking stuff to the left or right, based on what it has been trained to do. It is more effective than a simple word search - looking for the literal word “bomb” - but it is still very limited in its ability to moderate content intelligently.
The important thing to take away from that, is that it can’t get better. Unless you try to invent true AI and give it the ability to experience life and gain understanding of the real world and the sorts of contexts that humanity deals with, you’re always going to have the context gap problem. True AI, if generated, would pose ethical questions of slavery and the effort to create it is far too complicated for this purpose.
So, if you want to improve the output of your AI moderator, something like a “likelihood to be a person who goes off the deep end” index can be useful.
If I have a user who has never posted anything that seemed questionable, in their long posting history, then a single post that has an 85% confidence rating of containing illicit content is probably not illicit. The user has maybe posted a comment that discusses the history of lynching in the US, making it clear that lynching is bad, and it’s not a concern.
If, on the other hand, we have a poster whose posts have struck the same sorts of patterns as other people who later went off the deep end, and their newest post has an 85% confidence rating of including illicit content, then you will probably want to flag that post for human moderation.
With stage two, you can boost and suppress the illicit content algorithm with a secondary algorithm that takes a longer view of things and is looking for a more vague “general pattern of creepiness”.
I don’t know whether Facebook or anyone has moved into stage two, or plans to, or has had that idea. But, conceptually, they might (or might be doing something else that would have a similar outcome).
With stage two, all of this is still unbiased and still fundamentally tracks its way back to the moderation choices of human moderators, mostly based in India, the Philippines, and the Midwest. But, the odds that an extremist will have posts removed that weren’t strictly illegal becomes higher. Once you have instituted stage two, you will have reduced the amount of chaff that the human moderators are dealing with because it will have removed a lot more of the benign content and it will also auto-remove a lot of content that previously was too low on the confidence rating to moderate programmatically, beyond flagging for a human, now be removed automatically. The human moderators will move from mostly dealing with patently illegal content to, now, largely dealing with content that’s a little more questionable and context-driven and now they’ll be looking more at the TOS and less at the laws, and moderating stuff on the basis of hate speech and general bullying.
With stage two, extremists will find their stuff getting removed a lot more regularly.
But, almost certainly, that content will go away because the person is an extremist and so pinged the AIs radar as fitting a similar profile as a person who would be likely to eventually go off the deep end and post something patently criminal.
That’s not partisan, but partisans may well crowd to one side of the political spectrum. In that case, the algorithm, through purely impartial actions of foreigners with no political motives, working purely off laws restricting violence, nudity, etc. could indeed target one party over the other. But it will also target members of Earth Liberation Front more frequently than it will target members of the American Hiking Society. It’s purely looking for wackos, regardless of ideology.
So, in that respect, I would not worry about the big tech companies. Again, this isn’t a thing that I have personally worked on, but I can’t envision any reasonable way for partisan politics to have interceded, given the scale of the task and the people who would actually be doing the brunt of the work.
Where I think that the tech companies are being…dishonest or misleading, is in saying that they can’t do better and that the technology is just still a work in progress.
If I recall correctly, there’s about 1 police officer per 1000 people, in the US, and we can expect that that’s what it takes to get a fairly lawful society at the level we’re all comfortable living in.
Online, the ratio of moderators to users is probably 1 to 100,000 or 1 to 1,000,000 - though, I grant that I’m just guessing. While it may be that AI helps that to ratio to be more reasonable than it would seem, if the target should be 1:1000, it probably doesn’t help that much beyond the simple system of having other users manually flagging content (which is, in essence, just the same as when a citizen calls 911 to report a crime).
Facebook could just hire more moderators. They are not poor. And while, yes, technology can and will get better and could possibly become good enough that it would require very few or even no human moderators to manage, we are not to that point yet and the job needs doing.
The problem of the social media platforms isn’t that they’re biased - on that they’re clean - it’s that they’re trying to pretend like the Internet can’t be policed because it just has “so many people”. That’s not the way it works. If 1 in 4 people is a neurosurgeon, it doesn’t matter whether you’re looking at a group of 400 people or 4 billion people, you’ll always have one neurosurgeon for every non-neurosurgeon, eager to get their trepanning on. There aren’t too many posts or people on the Internet, you’re just trying to be cheap and not hire enough moderators for them all.
(Granted, some companies, like YouTube, are operating at a loss and so it’s unreasonable to expect them to hire more people. But, contrariwise, that simply indicates that it may be time for the government to subsidize these efforts where it is needed, or create its own policing force for the Internet and create standardized protocols that allow their force to operate, independent of platform.)