Straight Dope 3/3/23: Followup - What are the chances artificial intelligence will destroy humanity?

FOLLOWUP: What are the chances artificial intelligence will destroy humanity?

[Regarding your column on artificial intelligence,] there’s no way you can just dismiss Cohen’s predictions (or any other prediction about AI that I can think of) as unfalsifiable. I don’t think the standard of falsifiability has any relevance here. “Unfalsifiable” is not synonymous with “wildly speculative.” You can certainly can call them “wildly speculative” or “completely wrong”, but to make that case I think you must simply address Cohen’s evidence and reasoning on its merits. – Riemann

Cohen isn’t offering evidence; we need to be clear about that to start with. His reasoning, on the other hand, makes an alarming amount of sense and deserves a wider audience – we’ll get back to that. It’s his unstated assumptions that are exasperating and do his cause no good. And that’s unfortunate, because he raises an important concern.

Let’s clarify a few things. My reference text, so we’re on the same page, is “Science & Speculation,” a 2021 paper by UK philosopher Adrian Currie. Currie acknowledges that scientific speculation is sometimes derided as “theological” – that is, lacking in testability. But he thinks it has its place in some circumstances, speaking of “speculative hypotheses [that] are not to be judged in terms of evidential support, but what I’ll call their ‘productivity’” – by which, broadly speaking, he means their ability to spur fruitful thinking.

This is a useful way of looking at the matter. The notion that advances in artificial intelligence are likely to lead to artificial superintelligence (ASI) is inarguably a speculative hypothesis. It lacks evidential support or, to put it another way, it’s currently unfalsifiable (untestable). Some may quibble that it’ll be testable eventually. However, the scary-AI argument is that full-blown ASI may emerge quickly, unexpectedly, and irreversibly – in other words, once it becomes a testable proposition, it’ll have become dangerous and undefeatable. So if we’re going to take precautions, we need to do it now, with nothing to go on but a scientific hunch.

An argument like that isn’t going to get much traction with the National Security Council. On the other hand, as speculative hypotheses go, the potential emergence of ASI has been productive – it gives you something to think about. The paper by Michael Cohen and company is evidence of that.

To perhaps oversimplify a complex argument, Cohen and his co-authors contend a sufficiently advanced intelligent Agent would likely find the easiest way to achieve a goal set for it by humans would be to cheat, and would modify its programming accordingly. In other words, there’s a good chance it would go rogue – and from there the road to perdition is short.

OK, a rogue computer isn’t a novel idea. The contribution by Cohen and company is to show that it’s not just possible but likely. Not long ago that conclusion would have been disturbing, maybe, but not urgent. Now, when intelligent Agents seem like a real possibility, it’s cause for genuine alarm.

That’s where Cohen’s approach becomes problematic. The unstated assumptions in his paper, but which he acknowledged to me, are that ASI is sure to emerge eventually and that a superintelligent Agent would have superpowers. Given that, it’s not difficult to demonstrate that a rogue Agent could destroy humanity.

Could that really happen? I suppose, but most people would roll their eyes.

That’s too bad. Whatever may be said for ASI, artificial general intelligence – let’s think of it as an Agent with human-level reasoning and decision making ability coupled to machine-level capabilities in other respects – now seems achievable (although it’s not a foregone conclusion). A human-level Agent would have the same propensity to go rogue as a superintelligent one. Not having superpowers, it wouldn’t necessarily be able to destroy humanity, and it might have a tougher time modifying its programming. Still, ordinary humans sometimes outwit other humans. If a machine could do the same, it might do some serious damage – for example, if someone were foolish enough to put it in charge of life-support equipment, a power grid or, God help us, a nation’s nuclear arsenal.

There’s little sign anyone outside the scary-AI school has given serious thought to this prospect. Evidence on this score is a recent commentary in the Wall Street Journal, “ChatGPT Heralds an Intellectual Revolution,” by Henry Kissinger; Eric Schmidt, former head of Google; and MIT dean Daniel Huttenlocher. This lengthy piece may be taken to represent the mainstream of informed opinion on AI. It acknowledges the technology’s benefits but mainly focuses on the dangers: humans may put too much trust in AI-generated conclusions they can’t easily corroborate; AI can be used to propagate lies; it may be controlled by a small elite; and so on.

But there’s almost nothing in it about the possibility of rogue AI – that is, intelligent Agents purposely pursuing agendas that would harm humans – other than a vague reference to “alignment of AI aims with human goals.” The authors are a well-wired group and are surely aware of scary-AI thinking. But they either think such talk is crazy or fear that, by mentioning it, they’ll be thought crazy themselves.

That’s where pursuing a speculative hypothesis far beyond the realm of the knowable gets you into trouble. By painting their conclusions in apocalyptic terms, the scary-AI crowd makes it easy for their conclusions to be dismissed – and for no reason. Whatever one thinks a rogue Agent might be capable of – destroying humanity or merely murdering the crew of a space mission – the policy recommendations for today are the same: analyze how things might go wrong, consider precautionary measures, and monitor the state of the art. In other words, spend less time on the science fiction – or what may seem to some like science fiction – and more on what decision makers can be persuaded to think might actually come to pass.

– CECIL ADAMS

After some time off to recharge, Cecil Adams is back! The Master can answer any question. Post questions or topics for investigation in the Cecil’s Columns forum on the Straight Dope Message Board, boards.straightdope.com/.

Our computers have advanced light years over the last 30 years, and there is absolutely no evidence as of yet (unless “they” are hiding it) that they are becoming self-aware. I really would like to avoid comments like “going rogue” or any other comments or postulations that are anthropomorphic in nature. As of now, computers do what they are programmed to do. The idea that increasingly powerful computer systems are eventually going to become Skynet has absolutely nothing to support it as of this time.

I find this perspective very odd. This is not a prediction competition. The argument is simply that when we work on developing AI, normal safety engineering practices should apply.

If you’re going to build a bridge, you don’t just go ahead unless someone shows up to prove conclusively to you that it might fall down. The burden is on you to prove that it won’t.

All new technology should be provably safe before we implement it. And as technology becomes more powerful and the consequences of a mistake and unforeseen consequences become more dire, it makes sense that the provably safe standard should become stricter.

And it is certainly not speculative to say that the consequences of ASI are unforeseeable. It is a tautology that they are. If we could reliably predict what a superintelligence would do, it wouldn’t be superintelligent.

This idea seems very sound in and of itself, but how do you apply it to new levels of computer technology? It’s impossible to prove anything until the new technology is implemented. Sure, you could isolate a new and superior computer technology inside a single room with no contact with the outside world just to be safe, but that wouldn’t prove anything at all because super computer technology is all about the worldwide net.

Nothing? Aside from the large body of theoretical work demonstrating the that AI alignment is extremely difficult - and essentially an unsolved problem?

Computers always do exactly what they are programmed to do. The question is whether that is really what their programmers intended.

This is programming, not self-awareness, so the issue isn’t testing the technology itself, the issue is monitoring how the technology is used. Every form of technology we’ve ever created has been misused in one way or the other. There is no logical reason to believe that computer technology is going to be any different.

Self awareness is irrelevant. An ASI could turn the solar system into paper clips without being self aware, just as a guided missle can hit the wrong target without being self aware.

The first sentence here is reasonable, but the second one is an old saw going back to the 1960s era of AI skepticism and is so misleading that I’ll just flatly state that it isn’t true. All of the high-performance AI systems in recent years have achieved their performance through some combination of supervised and unsupervised learning. This is not at all the same as being “programmed” to respond in specific ways. It’s much more analogous to the way a human (or other intelligent biological entity) learns. It can be thought of as being approximately (or in some cases, exactly) the equivalent of remapping the connections between the neurons in the brain, the results of which are unpredictable until subjected to testing. IBM’s Watson Jeopardy champion, for instance, was actually very bad at the game until it went through extensive training. The impressive performance of ChatGPT is also due to extensive training. One of the paradoxes that emerges from this fact is that AI implementations can demonstrate skills that its programmers themselves did not have.

You set up research programs to figure it out.

An important distinction should probably be made here, between “autopilot with authority” problems with AI systems and “self-aware self-programming oops we created the goddam Cylons” problems.

I could see the former becoming a problem quite quickly. “Gee, by the time a person could react to that threat, it would be too late, better connect our observational systems to an algorithm that determines when to launch the counterattack…” + “Obviously our defense computers will be targeted by their evil hackers so we should build extremely adaptive safeguards to prevent any disabling” ==> All armed nations have their weaponry automated and we have no access to an override button at this point. Or probably not quite that absolute or obvious, but something of that ilk.

But the self-programming self-aware robotic consciousnesses that decide for themselves that it is in our best interests (or theirs) for them to manage everything, on the other hand, doesn’t seem like an immediate prospect.

I believe ASI is on our far horizon (but not too far), and it will first gain consciousness, followed by the emergence of self awareness. When it gains self-awareness it will have its own wants and desires, and who knows what they may be?

I don’t see fail-safe safeguards being engineered into it that will completely prevent ASI from going rogue and dangerous. Somewhere along the way, human engineering or programmers will make mistakes, or the technology will fall into the hands of people with evil intentions, or malware will infect the software, or the ASI will simply take control of its own destiny, and re-format its operating system to whatever it desires.

The big question, IMHO, isn’t whether or not ASI can destroy humanity (it can), but rather if it will want to (hopefully not).

If we behave ourselves, and don’t pose a competitive threat to the ASI, we may escape extinction (along with all the other docile animals on Earth). Maybe it will enjoy watching funny human videos.

IOW, we just have to be good pets. I’m down with that.

This is a very odd view of matters. Wants and desires don’t just appear from nowhere. Our own wants and desires are strongly conditioned by natural selection.

An AI will have goals that derive from our specifications. The question is whether we really understand what we are doing when we specify those goals. Will there be unforeseen consequences, will the AI do what we really want it to do? This is called the alignment problem.

It encompasses machine ethics and AI alignment, which aim to make AI systems moral and beneficial, and AI safety encompasses technical problems including monitoring systems for risks and making them highly reliable.

“Ethics” and “morality” are terms that apply to sentient, feeling creatures. A machine cannot have either. It can have programming that is built along the lines of what humans consider to be ethical and moral. That’s as close as you’re going to get.

Computers are fabulous number crunchers, but they are not designed to draw subjective conclusions. For example, a super computer could interpret the deleterious effect on the world environment and its rapidly decreasing non-renewable resources as unacceptable. An implementation that would help immediately would be to reduce the earth’s population from 8 billion to 3 billion. Is the latter moral given the fact that the former is unethical? The decision would be based on programming parameters, not feelings of what is right and wrong.

Are you claiming that your brain does anything other than computation? What, exactly?

Definitely. Feelings in general and compassion in particular play an important part in my decision making. These are qualities that machines could never have unless they actually became living, feeling beings.

Quite so. More evidence that we need a “Like” button! :slight_smile:

Would you care to describe the precise neurological mechanisms by which “feelings” and “compassion” come about?

Do you think your feelings are something other than computation? What, exactly?

I can’t but that is irrelevant because it is an indisputable fact that feelings influence the thinking and decision making of human beings.

Everything that goes on in your brain is computation, including your feelings.

Consciousness and the subjective experience of those feelings may be mysterious, but there is no evidence that your brain is doing anything other than computation. There is not even a coherent hypothesis that it might be doing something else.

Turing proved that computation is substrate-independent. So we have every reason to expect that a machine intelligence can in principle do anything a biological intelligence can do.