Thank you for making this thread so we didn’t not detail the other one further.
I feel like perhaps I did not explain myself particularly well in the other thread. I will claim that it is because I had to leave for class relatively quickly and not because I did not think it through completely.
I don’t know your level of expertise with AI, so if this isn’t informational to you then I apologize. I am not trying to speak down to anyone, but I feel it is important to ensure I’m fully making my point to fully describe the process. Before I begin, a couple of minor things. One, I’m going to use the term AI, when really we should be saying machine learning (ML). AI is a broader term, and what I’m going to describe is the basic paradigm for ML. Two, I’m going to speak more in general terms than specifically for neural networks. It really does not make any difference in this case, although certainly there are differences between black box, grey box, and white box AI. Three, I’m only going to describe a mono-AI. Again it just doesn’t make any difference so I’m not going to make it more complicated than it needs to.
So the basic process of an AI is the following. You have some set of inputs (I), which is passed into an algorithm (A), which creates a set of outputs (O). Machine learning works by having a sufficient realistic data such that the algorithm can learn the model that best translate any values for I into the correct values for O.
But what does this mean when I say correct values for O? Because often, the output values are not used directly. They are encoded. In fact, the encode/decode step can often be quite challenging (often inputs need to be encoded as well). So, the outputs are typically representative of elements in the problem space being explored. For example, the values [0.25,0.1,0.3,0.5,0.75,0.0,1.0,0.25,0.1] is meaningless right? Without knowing the context, it is just a bunch of values seemingly from 0 to 1. However, when put into the context of a problem space, for example, the weights to apply to a set of features, then they make sense.
So, suppose we had to make an AI as sophisticated as the ones in “I, Robot” but still matching the paradigm of AI as we know it today. The output from such an AI is not going to be “lift the box, oh and kill all humans”, but rather a (likely stupdendously long) set of output values. These values are then decoded by some predictive model of reality, which creates an action and associated consequences.
What we would do at present is do a problem space analysis and determine the hyperspaces that have areas we don’t want. Or we use a priori information to just not allow the AI to consider them. So, for example, we could code the predictive model to keep a count of the number of dead humans that result from the action. If the count is greater than zero, then the course of action (or inaction) is forbidden. I actually did something similar in my PhD work (no I wasn’t building killer robots beep), but rather there we certain types of results that I knew could not work*1, so I simply had a simple Boolean variable that flag when such an error type occurred, bypassed all other processing and return infinite cost. I did not want the AI to even consider such a possibility.
I hope this helps explain why I believe that given an AI with a human-level of predictive power, then it should be possible to encode the Three Laws.
(*1: My work was on algorithms that create algorithms, so if I knew certain tasks (say B had to follow A) had to take place before other actions although not necessarily where in the chain, then any output that decoded to A following B could be immediately rejected.)