“I’m sorry, Dave, I can’t do that.” AI “kills” human operator in simulation

Robert Miles has a bunch of YouTube videos about AI safety and has been on Computerphile a few times, and from what I’ve seen of those videos, this is pretty much to be expected. (In googling to confirm his name, his last tweet was shown, and it relates to this topic). When you set an AI’s objective, anything that you put in its way of accomplishing that objective the AI will figure out a way around if it can. The AI will always be resistant to attempts to change its objective, because if it’s reprogrammed to have another objective, it will fail its current objective. That may seem a bit strange, but in general people work the same sort of way - while some people may seek help with certain psychological impulses, their seeking of help is part of their overall goal of staying healthy and they recognize those impulses are counterproductive to their goal. It’s very unlikely that people will voluntarily undergo therapy of any kind that would result in them, for instance, not loving their children.

Thus, if a AI knows that the operator is able to change its objective, then it will make sure that it cannot communicate with the operator. If the operator gives it a command that runs contrary to its current objective without changing that objective, then it will simply be ignored, and if the operator has power over the AI, it makes sense for the AI to eliminate the operator.

Given that, it does make more sense that this was only on the level of a thought experiment, because it’s those kinds of thought experiments that are at the heart of AI safety research that I’ve watched a bunch of YouTube videos about.

A sufficiently advanced thought experiment is indistinguishable from a simulation, and a sufficiently advanced simulation is indistinguishable from the real thing. The AIs are now so advanced that they’re capable of running on our wetware, pretending to be mere thought experiment.

The worst part is this isn’t even the most embarrassing AI-related news story this week.

Not only do people work the same sort of way, we often celebrate it. Think James Kirk and the Kobayashi Maru test.

I, for one, welcome our new drone overlords.

…of course, the actual funniest thing to happen out of this non-incident was this exchange:

Movie idea: points are awarded to drone ai by operator for destroying proper targets, and ai goal is to accumulate points. Dependent on the operator to achieve its goal, it is non- hostile towards him- and instead through trial and error develops a flirtatious personality that tries to seduce him.

Human or AI, trying to game the system is universal.

So basically Doki Doki Literature Club with drones?

Indeed - and misalignment of objectives, plus emergent self-preservation, deception and hiding of motives have already been observed in real-world experimentation with AI algorithms.