Do the Three Laws of Robotics make sense based on our understanding of computers?

The Three Laws are presented as something intrinsic to AI in Asimov’s universe; as noted above, for various reasons, you cannot create a robot that doesn’t follow the three laws without starting from scratch, and since the positronic brains in Asimov’s universe are basically a black box designed by earlier versions of the positronic brain, this would be pretty much impossible.

A filter on ChatGPT is pretty much nothing at all like that.

Rob Miles is excellent, both on Computerphile and on his own channel. I linked his video on the Alignment Problem in another thread.

I think that was someone’s headcanon from further up the thread, but it doesn’t mesh with the fact that US Robotics decided to play around with the NS series robots and alter the First Law, as seen in “Little Lost Robot”. Only a robot with an altered First Law could create an altered First Law robot, so it’s altered-First-Law robots all the way down, as it were.

I’d be interested in the specifics of what adds guardrails to ChatGPT in terms of avoiding certain taboo topics. Do they filter the input, or the output? Do they use similar techniques to train their filters? And how would this relate to the interactions between different regions of the human brain, or perhaps any similarities to the Freudian idea of the mind?

In-universe, eventually working backward from the Three Laws there is deduced a zeroth law, that in fact is an overarching protect-humanity (as opposed to individual humans) prime mandate.

Which yes, is something you probably can’t deliberately program and would depend on the programmers’ prejudices - in-universe it is something emergent over the longer arc of the robot stories.

Similarly ISTM higher AI morals would be something the AI works out like people do, from scanning and analyzing the material on morals/justice/suffering it has access to.

I suppose designers oughta make sure there’s always an unprotected human-accessible, human-pullable plug :stuck_out_tongue_winking_eye:

Feel like I’m being whooshed.

Stuart Russell has made good arguments that we should not attempt to specify an AI’s goals explicitly, it is better if an AI attempts to infer our preferences from our behavior. This would keep an AI in a permanent state of imperfect knowledge about our preferences, and that is desirable. An AI with a fully specified reward function will ruthlessly pursue that reward, and if the AI is smarter than us this would likely include stopping us from switching it off.

https://www.quantamagazine.org/artificial-intelligence-will-do-what-we-ask-thats-a-problem-20200130/

I think that this is a overly literal interpretation of how the three laws work. My view is that saying the three laws in this way is a sort of “lies to children” heuristic for what is really going on under the hood.

Here is an example of what could be really going in the postronic brain.

Every robot has basic level environmental evaluation and language interpretation modules built in. They also have various other modules that interpret different commands and determine possible ways to complete that command and then scores the possible actions to determine the best one. Not every robot has every type of processing module. If ask a robo loader to cook you breakfast it may say that it can’t do that.

However one module that every robot has is a highly sophisticated, harm evaluation module. This module has been highly trained and engineered to determine possible sources of imminent and long term harm to individuals.

There is also a highly sophisticated harm prevention module that is specific to the physical capabilities of the robot in question and has been programmed with actions to take if a flag is raised by the previous module

The harm prediction module is constantly running as a background process to look for sources of potential harm in the vicinity, and if it identifies possible harm to humans it will immediately send a job to the action harm reduction module to alleviate that harm, this job take priority over all other jobs.

If it identifies harm to itself and will also send a job to alleviate this harm, but will do so at the lowest priority setting after all current commands are complete.

The second way this harm estimation module is used is that all other modules will send their list of possible actions through the harm reduction module. Any actions that are predicted to result in likely harm to an individual are immediately discarded. If this results in no remaining possible actions then nothing is done. If any of the options indicate harm being done to the robot, then those actions are given lowest efficiency scores, and only used if no other options to complete the action are found.

So this is a bit more complicated than just “obey the three three laws”. But “Three laws” pretty much sums up what is going on and is better marketing.

Is it absolutely perfect? Of course not, but much like a self driving car, it doesn’t necessarily need to be perfect it just needs to be better than the meatware of those its trying to replace. Will it decide that the best course of action is to enslave humanity? Not unless global warming is included among the list of harms and the software engineers who beta test the module before release of the new update are asleep at the wheel.

Computers are programmed and the programms are written to follow a logic, otherwise they would not work. Fortunately that does not mean that the behaviour of the computer following the logical instructions is in turn logical. That does not follow at all. The internal logic of the instructions is one thing, the coherent logic of the result is quite another.

We do have anti-missile missiles today already, don’t we? They don’t seem to me to defeat the purpose of war.

It may seem like splitting hairs, but I must insist: robots/computers/artificial intelligences do NOT follow laws; they follow instructions. Intructions programmed more or less capably into them. The instruction must follow a logic that is based on laws or the program crashes, but that does not mean that the actions of the program are logical, coherent or based on those laws. Those are two different levels.

Just the problem autonomous vehicles must solve, and can’t. Because it leads to contradictions, there can be no rule to solve this conundrum, only more or less apt heuristics. Probably rather less, for want of computing time when in the middle of an emergence or accident. How do you choose when you can’t avoid at least one death? No easy answer, whatever the marketing department from Tesla or its boss says.

nope. You said society would have ways to mitigate the sociopath robots, I just proposed a method. A elite squad of rogue robot exterminators. I call them “spoon walkers”. I might change it. :slight_smile:

And really it’s a “Made up before the transistor was invented, much less actual programming language” sort of ‘lie.’

None of it makes any sense the way it was made, even Asimov said he called them “positronic” just because positrons were new and it sounded cooler than “electronic.”

What you said is a reasonable way of going about creating an AI with fetters, but it is not what Asimov was thinking when he made them up.

Yes, this is an important distinction, and in fact one I was already making. I was not referring to computer instructions, but “laws”–the high-level goals that the designers intend the A.I. to follow. My point was that even while a computer is precisely following its instructions, we have no reason to expect it to follow the laws that the instructions are intended to implement.

Because that defeats missiles that otherwise would have killed humans.

If it’s just machine vs machine, then we’re just shoving spoons in the garbage disposal. (That sounds like Michael Bay’s next movie.)

Thank you for making this thread so we didn’t not detail the other one further.

I feel like perhaps I did not explain myself particularly well in the other thread. I will claim that it is because I had to leave for class relatively quickly and not because I did not think it through completely.

I don’t know your level of expertise with AI, so if this isn’t informational to you then I apologize. I am not trying to speak down to anyone, but I feel it is important to ensure I’m fully making my point to fully describe the process. Before I begin, a couple of minor things. One, I’m going to use the term AI, when really we should be saying machine learning (ML). AI is a broader term, and what I’m going to describe is the basic paradigm for ML. Two, I’m going to speak more in general terms than specifically for neural networks. It really does not make any difference in this case, although certainly there are differences between black box, grey box, and white box AI. Three, I’m only going to describe a mono-AI. Again it just doesn’t make any difference so I’m not going to make it more complicated than it needs to.

So the basic process of an AI is the following. You have some set of inputs (I), which is passed into an algorithm (A), which creates a set of outputs (O). Machine learning works by having a sufficient realistic data such that the algorithm can learn the model that best translate any values for I into the correct values for O.

But what does this mean when I say correct values for O? Because often, the output values are not used directly. They are encoded. In fact, the encode/decode step can often be quite challenging (often inputs need to be encoded as well). So, the outputs are typically representative of elements in the problem space being explored. For example, the values [0.25,0.1,0.3,0.5,0.75,0.0,1.0,0.25,0.1] is meaningless right? Without knowing the context, it is just a bunch of values seemingly from 0 to 1. However, when put into the context of a problem space, for example, the weights to apply to a set of features, then they make sense.

So, suppose we had to make an AI as sophisticated as the ones in “I, Robot” but still matching the paradigm of AI as we know it today. The output from such an AI is not going to be “lift the box, oh and kill all humans”, but rather a (likely stupdendously long) set of output values. These values are then decoded by some predictive model of reality, which creates an action and associated consequences.

What we would do at present is do a problem space analysis and determine the hyperspaces that have areas we don’t want. Or we use a priori information to just not allow the AI to consider them. So, for example, we could code the predictive model to keep a count of the number of dead humans that result from the action. If the count is greater than zero, then the course of action (or inaction) is forbidden. I actually did something similar in my PhD work (no I wasn’t building killer robots beep), but rather there we certain types of results that I knew could not work*1, so I simply had a simple Boolean variable that flag when such an error type occurred, bypassed all other processing and return infinite cost. I did not want the AI to even consider such a possibility.

I hope this helps explain why I believe that given an AI with a human-level of predictive power, then it should be possible to encode the Three Laws.

(*1: My work was on algorithms that create algorithms, so if I knew certain tasks (say B had to follow A) had to take place before other actions although not necessarily where in the chain, then any output that decoded to A following B could be immediately rejected.)

Frederik Pohl once joked that the Will Smith movie kept nothing from the Asimov book except the title and the three laws, neither of which were Asimov’s conceptions.

Asimov’s fame and association with robots is fascinating. From the start, his stories were small additions to a long tradition of stories about robots in human society. His direct inspiration was Eando Binder’s “I, Robot,” the start of a series about a kindly robot seen with instant suspicion by humans. Not a single one of Asimov’s robot stories earned the cover of Astounding, although other robot stories did. One of them was the already mentioned “The Humanoids,” whose demolition of the three laws actually supplied the idea behind the movie. That was 1947, while Asimov was still turning out the stories. I can’t find exactly when Asimov’s Three Laws of Robotics became the public face of robot stories, but I’m pretty sure it came after he stopped writing sf for non-fiction. The first hit in the newspapers.com database is from 1963 and Ngrams shows the same pattern.

Fear of robots goes back to the 19th century. It was an Industrial Revolution overlay on the primeval fear of the Other, especially a human-created artificial being. What would happen if we gave machines power over humans? Nothing good. The evidence was all around, in the mills and factories and smoke-laden air and polluted rivers. No answers are possible, because we live in co-existence with machines and cannot remove them. The ongoing examination of the possibilities is interesting, but reality is created out of millions of tiny individual decisions. No one global answer is conceivable.

I would tend to agree with this. The only problem with this approach is you can never be quite sure if you’ve captured edge cases, which makes it problematic for things that are critical.

For example, if an AI is in charge of a nuclear power station, then we would never want it to remove all of the control rods. So suppose you starting training it, and every time it raises all of the control rods, we say “No, bad AI”. Ok. We now think it has learnt, but the relationships can be very complex. So we cannot be sure that it hasn’t learnt not to raise the control rods under conditions A,B,C,D,E,F,G,H,I,J,K,L,M,O,P,Q,R,S,T,U,V,W,X,Y, and Z, oh and AA.

Ooops. Somehow it never experienced condition N. boom

But, generally, it is preferable not to introduce human intelligence into the equation. When I was doing my PhD, my AI found a solution to a problem that was considerably simpler than the known solution. I figured it was wrong, and I was going to tell it to ignore such cases. But then I looked at it more and more, and realized it had actually found something new that worked.

I agree with you there, I was just positing a retcon, of how I imagine such a robot would actually work. It in opposition to the OPs conjecture that no modern person would/could design a robot with such a set of “laws”.

Or to turn that around - as a superintelligent AI learns more and more, I’m not sure how you stop it reaching too close an approximation to perfect knowledge of our preferences.

I’m not convinced that Russell has satisfactory answers to this, but I’ve only read his pop-sci stuff.

As an aside, Heinlein’s “Mike” from The Moon is a Harsh Mistress is a much better, more accurate fictional account of an AI. In fact, now that we’ve seen the path AI is going down, it’s becoming eerily familiar:

  • Mike’s abilities were not programmed in, but emerged when a sufficiently complex digital brain trained itself on huge amounts of data.
  • Mike used zero-shot learning, and supplemented it with Human reinforced classification learning.
  • Mike was capable of synthesizing not just voices, but images and video. The process described in the book seemed very much like how Stable Diffusion does it. The image started as static, then slowly built up in resolution.
  • Mike’s ‘consciousness’ was not planned, and when he lost enough connections it went away. Even after they restored the computer to have the same connections as before, he never came back. He was a product of both his ‘brain’ and the result of a huge amount of ingested knowledge. Emergence is unpredictable. He had enough ‘parameters’ restored in the form of neural circuits, but the tuning of the parameters was gone.

Mike certainly didn’t follow any ‘three laws’, and was perfectly capable of making his own choices even when they conflicted with the instructions humans gave him.

The book was written in the early 1960’s, and published in 1966.

The goal of war isn’t to kill the other side its, to destroy the other sides ability to project force. Traditionally killing the other side is the most efficient way to do this. But in modern warefare where its not possible to project force without machines, destroying war material is more important than destroying people. So robots would definitely be useful.

Actually as I think this though, the three laws of robotics could be quite interesting in war. Imagine that you have a fleet of pacifier drones that can imobolize people without killing them. This fleet is backed up by conventional forces who the robots are told will kill any of the enemy that the pacifiers don’t pacify. So pacifying the enemy will prevent harm coming to humns. On the other hand pacifying your own band of killer will do the same thing. So you need to have between your forces and the pacifiers a second line of robot killers who are under orders to kill any robots that try to pacify your troops. That way the third law kicks in and in order to preserve themselves the pacifiers go after their intended targets. Of course the enemy might have its own set of robot killers, trying to shift the equation the other way. So the tide of the battle may come down to the morale of the robots caught in the middle.

So are we talking about literally creasting a positronic brain, or coming up with a mechanism to inhibit our own AIs with something like the three laws? I will accept that we aren’t building positronic brains.

Also, fine-tuning isn’t a ‘filter’. It’s an additional set of layers on top of the pre-trained layers that very much become part of the overall AIs thinking in a deep way, as I understand it. We’re not talking about adding some code with encoded rules that we have to update as we think up more, we’re talking about changing the way the thing thinks so it literally can’t go there. Call it executive function - the thing that keeps us from actually hurting someone when we’re angry enough that we would like to, or tells us to get our ass off the couch and get to work when we’d rather just chill.

The problem with Asimov’s three laws is that they are impossible. An AI track switch following the three laws would break down if it faced the trolley problem. Many actions only carry probabilities of harm no matter which way you go, but none of the probabilities are zero. Some actions create complex results where harm is impossible to predict.

So we aren’t going to have any simple rules that MUST be followed. The world isn’t black and white like that. We simply have to create AIs that have a form of judgement that keeps them from doing horrible things directly, and then evaluate their behavior over time and adapt them such that the way they behave is acceptable to us.

I am not at all worried about them ‘taking over’ - not for a long, long time. I would, however, worry about zealous applications of AI causing unexpected behaviour we didn’t anticipate. For example, a track-switching AI that somehow decides two trains should be on the same track at the same time. But for a very long time it will be trivial to just shut down a misbehaving AI.

A bigger risk than them ‘taking over’ is that we speed into letting them run so much stuff that WE can’t shut them down because we need them. For example, a global AI energy management system that allocates energy suddenly misbehaving, but in the interim we stupidly pulled all our manual controls out of the system because the AI seemed to work so well, so we can’t shut it off without crashing the grid.