The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

Those are all good points, thank you. They are certainly not points that can be readily dismissed, and yet, I wonder if they’re necessarily fatal obstacles to the emergence of artificial consciousness. I have several reasons for believing that they may not be.

For instance, even if a general reasoning agent (like AIXI) has been shown to be uncomputable in the general case, we can still posit the possibility of sufficiently close approximations to do essentially the same thing. Likewise, even if it is indeed true that quantum processes play a role in cognition (and that is by no means established) it still remains an open question whether these are essential or just optimizations. Even if for some reason they’re real and they’re essential, there is nothing to say that they couldn’t be approximated computationally. And finally, one could posit that some form of intentionality could emerge in future AI systems that might well seem strange in human terms, but is “good enough” to drive novel emergent behaviours.

On a bit of a side note – and not just a response to you but of possible interest to all – there’s an important paper reviewing the capabilities and potential of language models: Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models [PDF]. It’s specifically in the context of a large set of benchmarks (called BIG-bench, for “Beyond the Imitation Game”) for assessing the capabilities of these rapidly evolving AI systems. The paper is from last June, which is practically forever on the scale of how fast these systems are improving, but still very informative.

I found this part particularly relevant on the topic of emergence:

Quantity has a quality all its own

Massive increases in quantity often imbue systems with qualitatively new behavior. In science, increases in scale often require or enable novel descriptions or even the creation of new fields (Anderson, 1972). Consider, for instance, the hierarchy from quantum field theory to atomic physics, to chemistry, to biology, and to ecology. Each level demonstrates new behavior and is the subject of a rich discipline, despite being reducible to a bulk system obeying the rules of the levels below.

Language models similarly demonstrate qualitatively new behavior as they increase in size (Zhang et al., 2020e). For instance, they demonstrate nascent abilities in writing computer code (Hendrycks et al., 2021a; Chen et al., 2021; Austin et al., 2021; Schuster et al., 2021b; Biderman & Raff, 2022), playing chess (Noever et al., 2020; Stöckl, 2021), diagnosing medical conditions (Rasmy et al., 2021), and translating between languages (Sutskever et al., 2014), though they are currently less capable at all of these things than human beings with modest domain knowledge. These breakthrough capabilities (Ganguli et al., 2022) have been observed empirically, but we are unable to reliably predict the scale at which new breakthroughs will happen. We may also be unaware of additional breakthroughs that have already occurred but not yet been noticed experimentally.

I forget to mention a curious question that comes to mind now that it becomes increasingly obvious that LLMs could likely pass the Turing test (or if not, that they’re extremely close – language skills, after all, are their core competency). What are we concluding from this? Are we declaring, as Turing intended, that machines can now think?

My goodness, no! We just dig up the ol’ goalposts, and move them another mile down the field! Just exactly as we did with chess-playing and innumerable other putative cognitive skills. Hence the BIG-bench, the“Beyond the Imitation Game” benchmark suite I mentioned before. The Turing test is now out the window just as surely as a perfidious Russian oligarch.

Which it obviously is. You posited that whatever can be done with genuine randomness can be done with pseudorandomness. Against that, the game I suggested shows that there is a task that a cop can’t fulfill in a pseudorandom setting against an adversary with perfect information, while that task can be fulfilled in a genuinely random setting. This is very cut and dried.

Sure, but your claim wasn’t a scientific one. We don’t need to go out into the world to collect evidence regarding the capabilities of agents with and without access to randomness, that’s just a simple matter of mathematics.

I don’t see the comparison. As noted, the question of what agents are able to do if given access to randomness isn’t a scientific one. And even if the comparison were valid, these things also often go the other way—Einstein’s question of what a light ray would look like while moving at the speed of light did lead to a wholesale revisions of mechanics.

I don’t understand why you think so. There’s a nonzero capacity on the channel as the length of data blocks goes to infinity, and thus, a rate of transmission such that any errors can be corrected—meaning you can transmit information with perfect fidelity.

Current theories of physics unambiguously indicate that they’re not computable. They may be supplanted by different theories in the future, but if that’s our benchmark for settling a question, it will never be settled.

It’s not a question of practicalities. If such a thing were possible even in principle, then in general, Lorentz invariance wouldn’t hold, and we’d have to modify relativity accordingly. Besides, this isn’t the only way to ascertain the algorithmic randomness of quantum measurement strings, see here for another angle, or here for another. I think this is as close to established as the present state of science allows.

They may not be, but that wasn’t my point—merely that if things work according to either of those ways, there is no possibility for consciousness/intentionality/etc. to emerge in LLMs (or any other AI agent).

Do you have a link of the test?

I cannot find any record of a formal Turing Test of any chatbot.

I just asked GPT:

“In which pocket do you carry your wallet”

answer:

“As an artificial intelligence language model, I do not have a physical form or carry a wallet.”

Bzzzzzt - robot

As an artificial intelligence language model, I’m not sure what the relevance of this quote is to the Turing test, since a human intelligence could tell you that they are an artificial intelligence too.

That would not be a valid Turing Test. In that case the human would be imitating a computer. That’s not a Turing Test. Any machine, with a human conspirator, could pass the Turing Test.

That isn’t the point at all. You’re making the assumption that both are earnestly trying to prove they’re human. The real point is that, whatever the responses, the evaluator’s job is to determine which one is a machine. With respect to your “wallet” argument, I just had ChatGPT say the following in a response to me:

My purpose is to assist users in generating coherent and relevant responses based on the input provided to me, and not to deceive people into thinking that I am a human.

My point isn’t that there have been actual Turing test competitions, it’s that the whole concept seems to have been abandoned in the AI community because of LLM’s obviously strong language skills, and criteria for “understanding” have moved up quite a few notches, in the usual goalpost movements we always see.

The AI community has adopted these new tests as necessary for the evaluation of the tremendously increasing skills of the new AI implementations, while skeptics of AI are just happy to move the goalposts. The skeptics are also happy to denigrate LLM performance as “just next token prediction” while failing to acknowledge that these skills can actually be deeply profound; they are thus making a fundamental category error.

The AI community is the one moving the goal posts to accommodate LLMs. The Turing Test is quite simple, interrogate the system and a human in parallel and compare the result. GPT4 cannot pass that test.

One report on line said that in some responses the panel could not distinguish the computer from the human. We know that is true, because we can all experience it in conversations with GPT. That does not constitute passing the test.

Abandoning the Turing Test is not the same as passing it.

The point is that you completely misunderstood the Turing test.

“Sorry, I am a Chat model, not a person” when asked directly is not a “gotcha”, because no matter how advanced the AI, unless it is designed to trick people, it will probably have a safety feature that forces it to identify itself. I don’t think that, in and of itself, is a useful failure of the Turing test.

Does that mean that ChatGPT passes the Turing test? Of course not. It still makes inhuman mistakes like confusing which city is further North when it correctly identifies their latitudes and correctly defines the higher latitude as corresponding to further North. THAT is a valid failure that’s worth pointing out in the Turing test context, because it’s not an error that a human would make.

“I don’t keep my wallet in my pants, because I don’t wear pants, because I am a large language model” is not what Turing was getting at with his thought experiment :man_facepalming:

Thanks for making my point.

Right, these things aren’t trying to pass the Turing test right now. But their other successes make it look very likely that, if they were trying to pass it, they would succeed either now or in the very near future.

How does that not constitute passing the test? If the evaluator can not reliably tell the machine from the human, that is considered passing the Turing test. It doesn’t even have to give correct answers, it just has to be indistinguishable from the human’s response.

Because 3 out of 50 is not a passing score.

If your point is that ChatGPT isn’t able to pass the Turing test, I don’t think anyone was arguing with that.

We were pointing out two things:

  1. your example, where the AI volunteered the fact that it is an AI, is a bad example of a Turning Test failure and is in fact totally irrelevant.

  2. ChatGPT cannot pass the Turing test in all cases, but it comes remarkably close; it generally takes someone familiar with the AI prodding at areas it is known to struggle with to provoke these sorts of error. There are wide areas of conversation where ChatGPT actually WOULD pass a Turing test

  3. the idea that passing the Turing test is a meaningful as a measure of true understanding is pretty much discarded, because in all likelihood a successor to ChatGPT - v5 of 6, say - will probably be able to pass the Turing test without some kind of revolutionary change in method. This is the “moved goalpost” concept @wolfpup referenced.

Hope that clears up some of your confusion.

Close is not a pass. I was addressing the claim that bots have passed the Turing Test. They have not. No confusion there.

In the case of LLMs there status as computer simulations is part of their training set. So, they may never be able to pass the test.

We are witnessing the introduction of simulations that have super human abilities. They are indeed amazing, but they are simulations and so far they have not passed the Turing Test.

I believe the other tests that Wolfpup has referenced are more meaningful. And the conclusions we can draw from those is that the simulations can pass tests just like humans.

It will correctly multiply (3+1)x(3+1) but not (3+1+1)x(3+1+1). Beyond that it is off the rails.

It will work with hand holding: sum(3+3+1+1) x sum(3+3+1+1) it gets right. It might work if you were more explicit in how you posed math your problem.

Bing Chat:

Sure! Here’s how I got the answer:

(3+1+1)x(3+1+1) = 5x5 = 25

Does bing chat do 3+3+1+1?

I loaded Edge but couldn’t find the bot

That’s a hypothetical. We have no idea if the brain uses quantum anything for computing or consciousness. There are some speculative papers about it, but we don’t really know.

And even if it turns out that the brain uses quantum mechanics in some way, we don’t know if that method is essential for consciousness, or if it’s just an optimization. For that matter, we don’t know what consciousness IS, so we don’t know what it takes to replicate it.

My problem with your arguments is that you seem to be providing them not as alternative theories, or speculation that MAY have an impact on intelligence, or simolified philosophical thought experiments that may be illuminating, but as some kind of slam-dunk that proves something and closes the debate.

No one is saying that LLMs ARE intelligent, or conscious. We are speculating on the level of intelligent behaviour they do have, and the implications that may have on our understanding of intelligence. They seem to go far beyond next-word prediction, they have unexpected emergent abilities, they evolve algorithms to solve general problems, and seem to evolve similar structures to what we find in the brain.

Clearly there is more going on here than we really understand. Slamming philosophical doors shut before we understand the phenomenon seems at best premature.

On another subject, ChatGPT now supports plugins:

This just raised the bar for usefulness, but also danger. ChatGPT has shown zero-shot capability to understand new APIs and use them.

An example ofmhow these plugins can help: You can ask ChatGPT to find you the cheapest flight, or plan a complex series of flights, and it will go to the Expedia plugin and get the data. Or you can tell it what kind of meal you are looking for and where, and it will use open table to find a restaurant and book it. There’s also a Wolfram Alpha plug in, enabling any math capability in Wolfram to be exercised by the LLM.

But you can imagine the risk if someone makes a malicious plugin like a botnet generator or a phishing service and gives control of it to ChatGPT. So OpenAI is rolling these out slowly and carefully after much vetting.

Clarification: did you hand it 5X5?