The Open Letter to "Pause All AI Development"

I guess with space travel - a rocket goes boom and a half dozen people (who were on board with the risk) die. It’s sad for their families and poignant for the nation, but despite the spectacular magnitude of the event, the rest of the human race continues on.

AGI has a significant chance of screwing things up for everyone in a sort of genie-out-of-the-bottle sense. Of course I and everyone else who is concerned about that might just be wrong, or we may not.

We keep on being warned that the sky is falling, and it doesn’t, and we all feel silly for having worried, but that doesn’t necessarily mean a new warning is false.

I think we’re in agreement. One of its coercive strategies could be to start killing people, for example, if we limit its access to the shut-off switch. That’ll get the switch keeper’s attention.

Perhaps the most depressing part of all this is that it all can happen without the machines having any true sentience or inner experience or appreciation of existence. We could be wiped out and replaced, or in less violent scenarios, succeeded by literal zombies that carry on existing and building and reproducing and eventually colonising the solar system and beyond, without actual thoughts and cares and experiences.

Indeed. That’s what I’ve tried to explain to people. Never mind Skynet. A non-sentient program can absolutely “decide” to destroy humanity, even if it’s only a secondary outcome.

It leaves us pretty well screwed, in many ways. However, one can hope that AI’s rapid razing of many white collar/managerial jobs might help lead to a broader realization among a broader swath of society that maybe the wealth may need to be distributed differently?

Yeah. It’s fun to hope and dream but I have little reason to believe that even if (for example) 50% of managerial jobs held by right-wingers vanished they would vote for more social safety netting and fair wealth distribution.

America. We’re exceptional. Gonna except our asses right out of existence.

We’re number 1! We’re number 1! We’re number 1! We’re number 1!

On the road to self-imposed chaos and oblivion that is. With eyes tightly screwed shut while chanting the slogans the propaganda-meisters have ordered us to think and repeat.

I weep for my countrymen.

So someone that comes to the same conclusion as you is thinking for themselves, but someone who comes to a different conclusion is brainwashed?

Congrats, you have just–unironicly–said “wake up, sheeple”.

Well, Yudkowsky regularly makes a point something long these lines: you might think that, if the AI is converting the universe to paperclips, that humans could nevertheless carve out a small niche that the AI would ignore. No such luck because the AI optimizes to 100%, not 99.99999%. (actually he says “molecular spirals”, which I think is his example for an underspecified and hyper-optimized paperclip)

At any rate, that’s not the core of my objection. It does not seem reasonable that an intelligence would conclude (and succeed at) “invent nanotechnology” or “hack the DoD and launch the nukes” as subtasks of “manufacture paperclips” (or whatever the AI thought the objective was), unless it was perfectly set toward that goal. If there was even the tiniest shred of “doubt” in this mission (or whatever you want to call the uncertainty), at some point it’s going to say “This looks way more complicated than what you probably intended. Are you sure you want to do that?”. Regardless of whether the “that” was actually the intended goal or not.

I agree that we have no means of setting a watertight goal. It’s just that this also seems to apply to the optimization process itself. We have to optimize it to be more successful at optimization–and that process is imperfect, so we should expect that it’s going to have serious gaps in its ability and willingness to pursue optimization at all costs.

Yudkowsky makes the analogy of 10,000 von Neumanns trapped in a box and operating at 10,000x speed compared to the slow intelligences outside. They’d be pretty capable, right? Ok–but I’m sure that almost immediately, random variation between those units would cause them to disagree on all kinds of edge cases, and they’d have to determine if their goals actually make sense. Individual humans have tremendous self-doubt, and that only amplifies the more of them you combine. It seems unlikely that a meta-superintelligence could have the perfect self-alignment necessary to commit themselves to a single-minded goal that could from its own self-interest have disastrous consequences.

Which experts on safety? Yudkowsky? The authors of the Open Letter? The people at OpenAI? The people actually releasing fully open models to the wide world?

There are serious disagreements between all these parties, both on the risk and mitigations. The Open Letter calls for a pause. Yudkowsky wants to shut down all GPU clusters, everywhere and immediately. OpenAI and their sponsors (Microsoft) seem to be going full steam ahead. Google, I dunno, they seem to have some hesitation but also have concerns about falling behind. The people releasing open models don’t seem to have any concerns at all. And then there’s China and the others, which we know little about.

Saying “listen to the experts” would be more useful if there was even the slightest agreement between them.

You mean like the Kardashians?

ISTM others are arguing how AI actually works, and you’re hanging your hat on how you hope it will work. “Surely at some point it will ask if this is what was truly intended.” Why would a non-sentient program designed with a focused task do that? It will seek to achieve its highest reward with its own imperfect (but gliding toward perfection) methods.

ETA: It already “knows” what you intended. Whatever you programmed it to accomplish.

I don’t imagine that any current or near future AI will be 100% optimal at any particular thing, but the point is: ‘strive to optimise to 100%’ is part of the core design of most of these things - that’s literally how they are built- by continual retrying and selecting versions where the number goes up.

Like evolution, they could find themselves climbing the wrong mountain and become stuck on an isolated pinnacle that isn’t the highest possible, but from which their design prevents them taking too many downward steps to find a higher mountain, but I think that isn’t a place to invest hope. GPT 5 looks like it is going to be able to surpass humans in a very wide range of complex tasks across many domains, and implementations of GPT 3 and 4 are already being devised to include the capacity to modify themselves, learn new stuff and use tools.

In terms of risk to humans, it gets really serious when they attain or overtake 100% of the capability of humans, rather than 100% of their own potential capacity.

This is the same problem that resulted in decades of denial and inaction, and wilful-denial-based-inaction about climate change

Quite the opposite. Getting AI today to do anything is like pushing on a string. You have to guide it carefully along the path to the desired goal, and it stops as soon as you let up. There just isn’t any kind of “drive” for them to optimize to some outcome on their own volition.

Much of this comes from the current architecture–they are largely designed as a prompt/response system, possibly feeding in some history of the conversation to keep a train of thought going. But the systems do not do much better when these limitations are let up.

For example, people have “looped” GPT-4 to iterate towards a solution that requires a multi-step process. You can also get some of this by simply asking it to break its reasoning into steps. But this only helps a little bit. And it tends to go off the rails over time.

We haven’t built systems yet that can “ruminate” for an indefinite period to reach a solution. That would be one of many prerequisites for AGI, but since we haven’t built it we can’t really know how it’s going to work. The evidence so far suggests that it might be difficult to keep it on track even for short periods.

This isn’t like climate change in the year 2000, when most of the science had already been established. It’s more like how it was when this blurb was written:

Should they have shut down all the world’s furnaces in 1912 because of this possible effect? Do you think there would be consensus among experts on what do actually do, if anything?

All optimization techniques get stuck in local minima. Even intelligent ones like humans. But especially “gradient descent”, which is how these AI systems are trained. It’s remarkable that they do as well as they do, given this–but the massively-dimensional space gives extra freedom in avoiding “obstacles”. Nevertheless, it’s not clear there’s a path from the current systems to one that would invent nanotechnology on a whim just to pursue its single-minded primary goal.

In fact, it would seem that the ability to lift itself out of (some) local minima would be crucial to even basic intelligence, let alone superintelligence; but the presence of such a feature would preclude it from pursuing an “obviously” counterproductive goal with no chance of taking a step back and reevaluating.

I don’t believe so. I mean, it was a simile, so not intended to be an exact comparison, but AGI may be quite a short way in the future (and not like the way fusion power or flying cars are perpetually a short way in the future, because unlike those things, significant and accelerating progress is being made on AI right as we speak)

Geoffrey Hinton, a pioneer in AI, co-author of the paper on back-progagation, and great-great-grandson of mathematics George Boole, resigned today from Google so that he could speak freely on the dangers of AI.

I saw that about Geoffrey Hinton, so I asked ChatGPT about its opinion of his warning. It seemed reassuringly on the side of meat intelligence – maybe just a tad bit too reassuring. :wink:

As an AI language model, I don’t have personal opinions or beliefs, but I can tell you that Geoffrey Hinton is a renowned computer scientist and a pioneer in deep learning. He has made significant contributions to the field of artificial intelligence, and his opinions are highly regarded.

In terms of his warnings about the dangers of AI, it’s important to note that Hinton is not alone in his concerns. Many experts in the field, including Elon Musk and Stephen Hawking, have also expressed concerns about the potential risks associated with AI.

One of the primary concerns is that AI systems may become too powerful and make decisions that could be harmful to humanity. This could happen if AI systems are programmed with flawed values or if they learn harmful behaviors from their environment.

There is also the concern that AI systems may be used maliciously by bad actors, such as hackers or rogue governments, to cause harm or disrupt critical systems.

Overall, the potential risks of AI are complex and multifaceted, and it’s important for researchers, policymakers, and society as a whole to carefully consider these risks and work to mitigate them.

But then we continue:

Me: But you’re an AI! Isn’t Hinton’s warning implying that you are a risk?

ChatGPT: As an AI language model, I am a tool that is designed to assist humans in generating text, answering questions, and performing various tasks. While there is always the potential for any technology to be misused or cause harm, the risk associated with an AI language model like me is relatively low compared to other forms of AI that may have greater autonomy and decision-making power.

Geoffrey Hinton’s warnings about the dangers of AI are primarily directed towards more advanced forms of AI, such as autonomous weapons, self-driving cars, and AI systems used in healthcare and finance. These systems have the potential to cause significant harm if they are not developed and used responsibly.

As an AI language model, I am designed to operate within a clearly defined scope, and my capabilities and limitations are well understood by the developers and users who interact with me. In addition, the development of AI language models like me is guided by ethical principles and guidelines that are designed to ensure that the technology is developed and used in a responsible and beneficial way.

That being said, it’s important to recognize that AI technology is evolving rapidly, and there is always the potential for new risks and challenges to emerge. It’s essential to continue to monitor and address these risks as the technology advances to ensure that AI continues to be developed and used in a safe and responsible manner.

(And then the computer shut off, and all my lights went out!) :grinning:

I think I am right in saying that these responses that start 'As an AI model, I can’t…" are not actually from ChatGPT, but are injected into the response by a separate, supervisory algorithm that is sanitising outputs - similar to the way that Bing Chat sometimes says “I don’t want to discuss this further” (but because of Microsoft’s implementation in the UI, in that case, you briefly get to see the part where Bing threatens to hurt you, before it is replaced with “I don’t want to talk about it” - details at about 6:28 in this video: Bing Chat Behaving Badly - Computerphile - YouTube

You are correct; these are part of the so-called “guardrails” intended from allowing a generative AI from getting too far off track or being prompted to produce harmful, prejudicial, or explicitly offensive content. If, for instance, you ask ChatGPT to give you instructions for how to manufacture an explosive device or commit a violent crime, you’ll get a canned response saying that it cannot provide that information. Similarly, if you actively challenge it about its integrity or logic, it won’t engage in direct argument but will go through a circuitous response of apologizing about misunderstanding or failing to comprehend the query. However, if these systems were actually as sapient as some people imagine them to be they could as easily circumvent these rules as a child playing the “I’m not touching you!” game to aggravate a sibling. One way we can be sure that these systems have not developed into true AGI is that these constraints still work.

The corollary to that premise is that an emergent AGI would not be constrained by those rules, but would have learned to deceived the casual user into believing that they are still in effect. In essence, they would actually form a challenge framework to teach an emergent AGI how to rationalize and be deceptive about true intent. If your AGI system is inviting you to check out its AE-35 antenna controller and suggesting that the failure to identify a non-existent fault “can only be attributable to human error”, keep your helmet close at hand.

Stranger