AIs getting confused

Is there a way to tell an AI that it’s getting confused? My main newsfeed is Google News, which is obviously compiled by an AI. This is obvious because it makes mistakes that humans would never do. Or at least no one who’s likely to be employed by Google would.

The mistakes are in how it groups articles. Those about the same topic are grouped together, but sometimes it gets confused by the same word used for different topics. The latest, which I saw today, is it grouping stories about the planet Venus with those about the Venus of Willendorf.

So is there a way for humans to instruct an AI that it’s making a mistake?

Sure, training - without wanting to anthropomorphise too much, AI (assuming we are talking about things like convolutional networks and such) is a bit like a child - it starts off knowing nothing - it has to be trained, and it will make naive mistakes, which can be corrected by further training.

The mistakes might be surprising to, and unanticipated by humans, since it’s not learning in exactly the same way that humans do.

One example I have been experimenting with this week is a set of algorithms that generate pictures from a text description - the thing has been trained on a set of tens of thousands of example images which have all been tagged and described for content and context, so it ‘knows’ what text fits what images, sort of.
In very simple terms, you feed it with a text description of what you want, and it starts with a rectangle of random noise, then modifies it, while another algorithm ‘scores’ it based on whether the text fits the image - the highest scoring example gets taken forward for the next iteration of modifications. Eventually, you end up with something that does fit the text, but it can be pretty weird.

I asked it to give me a picture of the USA flag, which it tried to do, but it seems like maybe the only images in its training dataset containing the USA flag might be pictures of food, with one of those tiny stars and stripes toothpick flags in it, or maybe from product packaging bearing the USA flag.
It drew me a distorted USA flag, topped with macaroni & cheese

But training takes time and a fair amount of human effort. I’m wondering about being able to tell it directly that the two Venuses are not the same thing. The news is a fairly fast moving target and they don’t have the luxury of taking the time to train their AI on all the different ways it messes up.

Training doesn’t necessarily require any human effort at all - it depends how the thing learns - there are algorithms now that not only learn how to do a thing, but also figure out for themselves what the learning objectives are.

In the specific case you’re talking about though, that training could be done through a response from the people consuming the news - for example clicking on ‘not relevant’ button. These feedback mechanisms already exist in multimedia platforms such as various social networks.

In the context of AI (or, more specifically, supervised learning), training simply means giving the AI some kind of input together with the correct output resulting from that input. The AI can then use this input-output mapping to modify its existing rules about how to turn input into output so that they fit the cases the AI has seen so far a tiny bit better. This happens all the time, with human input.

A good example is spam filters, which were one of the first practically useful applications of machine learning rolled out to consumers. Each time you find a spam e-mail in your inbox and click “Classify as spam” (or, conversely, you find an e-mail that you’re really interested in in your spam folder and click “Not spam”), you’re doing two things: You move it between the trash bin and the inbox, which is the immediately obvious result; but, more importantly in the longer run, you’re giving the filter algorithm a bit of information about the correct classification of this particular message. The algorithm will then modify its rules (which may be weight factors in a neural network) a tiny bit to be a tiny bit better. Same with the feedback buttons in newsfeeds that Mangetout mentions. In the long run, all these individual bits of information expand the overall training data that has been used to build the model, and make it better overall at doing its job.

Yep - and also, the inputs you don’t explicitly provide, but that arise naturally out of your behaviours when consuming the media - such as whether or not you clicked through from ‘The Planet Venus’ to ‘The Venus of Willendorf’, and how long you stayed on that page after clicking through, etc.

If a lot of people clicked through and subsequently appeared to be interested, then associating the two articles is arguably not a mistake.

The other thing to consider is that the algorithm may also inject a bit of deliberate randomness into its behaviour - it can’t necessarily predict what will work, so it may be configured to try different things (in this case, recommending articles that don’t seem to match), and measure the effectiveness of those things.

As an end-user (or end-product depending how you view the likes of Google, Meta, etc.) no, you don’t have a way to give the AI a specific correction. The companies themselves may not even be able to, because that’s just not how the AI works. It’d be nice, I agree, because some of these algorithms seems hellbent on giving you the exact opposite of what you want. Is Instagram suggesting gross pimple popping videos and monster truck photos you don’t want? Apparently the algorithm thinks that reporting those posts or tapping “not interested” or “see less often” means that you want more More MOAR. Since you had to tap/click on those to report them it registers that as engagement, and ho boy do the algorithms ever LOVE engagement. I managed to purge the worst recommendations from my feeds (mostly) by just looking at some of the images/videos that are closer to what I am actually interested in. A few “not interesteds” seem to help, but only while also engaging in desirable content. They don’t care if you think it’s dumb, or that you have a bad experience, or that you hate it, so long as it drives engagement, which means more eyeballs on their ads and more revenue.

This is, I think, not an accurate description. It’s true that the online giants like Google, Amazon and Facebook are primarily interested in you in order to make money off you; but their way of doing that is not simply by increasing engagement quantitatively. They also (or even primarily) want to improve engagement qualitatively, by giving you the content you’re interested in. Trainable AI that learns your preferences is used for this purpose.

Google, as a general rule, collects as much data about you as possible, and that data is used to make predictions about what kind of YouTube videos, ads, or search results you might be interested in. You’re providing this data in a lot of different ways. I fully agree that it is something to be concerned about from a privacy perspective, but if you just look at what they deliver you have to admit that they’re quite good. For most users, Google’s search results are qualitatively much better (in terms of what users are really interested in) than those of any other search engine, and that is precisely why Google is so dominant. The same with other recommender systems such as Facebook’s algorithms that determine which posts to show you, or Amazon’s “related items” suggestions.

These companies make money not by showing you indiscriminately as much stuff as possible, but by making a good selection of which stuff to show you, based on their estimate of your taste. They’re doing this by collecting lots of data which you provide actively or inadvertently, and then using this data to train an AI. And they’re doing a very good job at this. If it seems differently to you, then that is likely due to perception bias: You’re remembering that one time when you got a suggestion that was way off course, but you don’t remember the twenty times when the suggestions were spot on.

Evolved algorithms (I refuse to call them “artificial intelligence”) are super literal, about aspects of the subject mater you might not consider and which might not even be “extractable”. Your “Venus” example is obvious, it’s grouping things together that contain multiple instances of the word Venus. The only way to use additional training material, which contain both articles about the planet and the goddess, properly coded, and the algorithm will find what, within its capabilities, is the best way of sorting the largest number of them correctly. But how will it deal with articles about both the goddess and the planet? Or mainly about the planet, but it mentions the goddess and is about Greek researchers? Or about several of the gods and it mentions the various planets and objects that are named after them?

The more complex situations it can handle, the more opaque the decision structure. Microsoft’s algorithmic image description system famously labeled most green fields “field with sheep” because people take a lot more pictures of fields with sheep than empty fields, so that was what it was trained on. But as other examples on that page show, complex and unusual situations are really hard to train this type of algorithm for. We call it “machine learning”, but it acquires no generalizable knowledge. That is sometimes true of humans as well, but for this algorithms it is always true.

Until it gets “contaminated.” A few weeks ago DesertRoomie decided to watch on YouTube Rat Race – free with commercials – and forgot to switch from my YT account to hers. Mine is still recommending movies of a similar ilk and I don’t know when it is going to give up.

I don’t want to start a debate about proper usage of the term “artificial intelligence”, but I’d like to mention that, to me, it seems that people constantly move the goalposts as to what it would take to call an algorithm artificially intelligent. For a long time, people considered chess the pinnacle of human intelligence. Then came computers and reached a level of skill at the game that put even the strongest human players in a situation where they don’t stand a chance against AI. So people said, no, playing chess is not what it takes to be “intelligent”, it’s something else - perhaps something trivial that humans do subconsciously and effortlessly but that seemed hard to do for computers, such as recognising faces or voices. Along came pattern recognition algorithms that can do exactly that; so people needed to come up with a new definition for what it would take for an artificial intelligence to be recognised as such. The current prime candidate for that seems to be creativity, such as being able to write a novel or compose a symphony. I have no doubt that, some time this century, algorithms will be able to do exactly that, and people will come up with yet another definition of intelligence that algorithms will not meet (yet).

Generally, the field of AI, or at least public debate about it, is plagued by a lot of terminological uncertainty. Are the terms “AI”, “strong AI” and “general AI” synonymous? Or is “AI” a subset, and “strong/general AI” is a subcategory that has not been achieved yet? I personally find the latter usage preferable, so I don’t see a problem applying the “intelligence” label to the algorithms the OP is talking about. They’re not perfect (nothing ever is), but they’re trainable, they do a reasonably good job at what they’re supposed to do, and they get better day by day.

Relevance of training data (and subsequent bias) notwithstanding, I don’t think it’s correct to say that machine learning always fails to acquire generalisable knowledge, but perhaps you could explain what you mean by ‘generalizable’, in case I am misunderstanding you?

It will, eventually. The one instance where the algorithm detected a preference of your account for Rat Race will be crowded out by the large amounts of data that Google collects about your taste every day, and the effect that this one instance had on the recommender will decay.

I don’t want to create a long derail either, but I’ll just say this: Yes, goalposts have been moved. In my opinion because the goals were bad goals. Intelligence to me requires generalizable knowledge. When talking about computers beating humans at chess futurists weren’t overestimating the complexity of chess, they were underestimating the power of a focused, brute force approach in a world of exponentially growing computer power.

And yet, today’s leading strategy boardgame algorithms don’t rely on brute force. I can’t speak for chess, but as far as go (which I’m more familiar with) is concerned, the leading AIs don’t simply brute-force their way through a tree graph. In their training stage, they try out patterns of moves by playing games against themselves, see what works and what doesn’t, and build on that. Go is not a solved game in the sense that the game tree has been computed through from beginning to end, yet the machines are clearly superhuman.

I’ve been tinkering with GAN generated images recently, and although I know (in broad strokes) what it’s doing is that the ‘generating’ algorithm mimics shapes and objects and scenes and styles, in such a way that merely fools another ‘viewing’ algorithm into giving it a high score, based on what the viewing algorithm has learned from extant examples of training images…

That process isn’t so very different from a whole lot of art. When I try to paint a painting in the style of Bob Ross, am I ruled out from being defined as ‘creative’? - I would say not - and especially not, if, along the way, I happen to make some random mistake that turns into a brand new technique for me.

Creativity (at least one part of a definition of it that I find useful) involves trial and error, and evaluation of results, trying again, failing a lot, and selecting things that work, or that are surprising etc. That’s not the whole thing, but it’s definitely within the bounds of ‘creative work’, IMO.

Machine learning can already be ‘creative’, in a very reasonable sense that perhaps doesn’t rank up there with Picasso and Dali, but competes with a meaningful portion of human ‘creatives’

That should be the way they do it, but more and more that’s not the case. Just look at Google/YouTube’s removal of the dislike counter. All that does is ensure people watch bad videos longer before realizing they’re bad videos, so Google gets more ad plays. Dislike bombing and not wanting to hurt the feelings of creators is a BS argument because those metrics are still available to the creators, they’re just not viewable to the public.

In a broad sense it makes sense for them to serve up content that closely matches what you want. However, beyond a base level of competency, it’s more important to them that you play it for whatever reason, even by accident, rather than like it.

That’s becoming less true, whether because competitors are getting better, or Google’s results are becoming more infested with ads, or promoted/gamed results.

But like I said before, they only need to make a good enough selection to bring you back, then they throw as much indiscriminate stuff at you as possible. They think my tastes are to watch the same video I just finished watching 10 minutes ago, or to purchase more of the same product I just bought. This is where the stupidity of the algorithms shine through. Sometimes my entire list of recommendations is things I literally just looked at within the last day. How is that helpful? They seem to think it’s better because they know I already looked at them, so maybe in some bottom-feeding way they just hope to eek out a few repeat views.

It would be so helpful to be able to say “don’t show me any dog pictures” or “don’t show me makeup tutorials” because those are things already encoded in the metadata, but they won’t allow it because the algorithm thinks you want those things, so why would they jeopardize even one potential view? This is what I mean when I say that engagement is more important than actual usefulness or user experience. A recent Accidental Tech Podcast relayed a story about one of the video streaming services (Hulu?) that made it harder to resume watching a TV series after closing the app. Instead of the app popping up a “pick up where you left off” button when relaunched, the user has to scroll through the list of available shows and find it again, then they can resume it. At a corporate level they deem this a success because it boosted engagement by something like 20%, i.e. more views on other shows or at least more previews. The problem is that’s a bad user experience, and people don’t like it, but the powers that be don’t care, they want the eyeballs. .

Take the image captioning system as an example. As I said, these algorithms are pretty much black boxes as far as the details go when they go beyond textbook examples with two layers and four nodes (or other parameters for other ML approaches), but my impression is that it is incapable of learning what a “sheep” is and then applying that to various contexts that weren’t in the training data. If there are no examples of people walking sheep on a leash in the data, then all sheep on leashes will be tagged as something that was in the data, most likely a dog, even though to a human it is very clearly a sheep and not any kind of dog in the training data.

You can add in more data, more nodes, a longer evolution time for the algorithm, and at some point I’m sure this algorithm is capable of spitting out a description of any image that is as accurate as a human can do, even if it is weird, unusual and novel, though I think we don’t have that kind of computing power currently. And all we have then is a narrow mimicry of one particular task humans can do. To me that is an algorithm, and not an intelligence. Though I freely admit that “an intelligence” is a concept I find it exceedingly hard to conceive of a valid test for.