The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

I certainly concur that it’s far from obvious that an AI can’t do these things. But if it’s true, and if I understand you correctly, this would not be an argument about emergence, it’s an argument about computability. Which is what I mean by ruling out “specific and limited” capabilities; enumerating certain functions that aren’t amenable to computational solutions tells us nothing at all about the potential emergence of high-level cognitive functions like self-awareness in computational systems.

I would look at it differently. Syntax is strictly about the structure of the symbols in some logical system. “2 + 2” is syntactically valid in arithmetic. “2 +& B” is not. There is a rudimentary semantics in that the symbols represent quantities and have positionally determined values, and operators represent certain well-defined operations on those values.

The point here is that the difference between understanding arithmetic and understanding the full scope of the semantics of language is one of degree, not of kind. Arithmetic is a simple closed domain whose syntax and semantics are completely defined by a few simple rules. The semantics of language is very much more complex because it involves an open-ended relationship with the real world. It is false, however, to say that AI lacks any such understanding, as for example in @Sam_Stone’s illustration of Bing Chat distinguishing the different context-dependent meanings of “bank”.

I gave similar examples earlier of sentences with technically ambiguous meanings that confounded early efforts at machine translation, but that a human would always interpret correctly based on real-world understanding. Early AI pioneers were almost despondent in concluding that such problems could only be solved by somehow imbuing the machines with human-level semantic understanding. ChatGPT handles such translations flawlessly. It’s clear that its model of the world is very incomplete, to put it mildly, but to imply that it operates only on syntax strikes me as an example of the goalpost-moving that we see so often in AI debates.

It’s an example of how the properties of the basis pose constraints on what can, in principle, emerge. Basically, the sort of things an LLM can do can be mapped to functions over the natural numbers (via Gödel-numbering), of which only a null set are computable; so the properties of the basis preclude almost all possible behaviors from actually emerging. If that’s your definition of ‘specific and limited’, well, we’re using those words quite differently then.

Except I’ve given concrete examples where it does just that.

Nothing about the semantics is relevant to this task; it’s just one of pattern completion, where it’s irrelevant what the symbols stand for.

Common mondegreen.

Jimmi’s line was “s’cuse me while I torch this guy.”

I just got to the top of the wait list to try out the new Google Bard. I haven’t yet delved into what’s powering it. I’ve only tried asking it to write story bits so far, and haven’t looked into how it handles searching.

So far it seems definitely way less capable of understanding prompts than ChatGPT-3.5. It’s downright awful at following specific instructions. It’s very expressive, superior to GPT-3-powered AI Dungeon as it was at its peak prior to when I quit using it in 2021. It has similar expressive outputs like ChatGPT-3.5, but the relation between prompt and output is lacking ChatGPT’s smarts.

I was just about to make a similar post re: Bard. You beat me to it but i agree on all counts, so far. Need more time to play with it.

Sabine Hassenfelder, who has been a critic of AI for years, has a new take:

Another really good podcast with Sean Carroll and Raphaël Millière, a philosopher and cognitive scientist at Columbia University. The subject is, ‘How Artificial Intelligence Thinks’. I highly recommend this if you are interested in the subject.

@Sam_Stone, thanks for posting those videos. By coincidence, I’m a big fan of both Hossenfelder and Carroll. In fact I have a couple of Carroll’s books. It’s nice to see them weigh in on the discussion. I haven’t watched the Carroll video yet as it’s nearly two hours so maybe later.

I was glad to see that Hossenfelder is totally on board with AI eventually becoming conscious. But for the present, one important takeaway from the video, in my interpretation, is that we shouldn’t be too quick to declare that ChatGPT has no real understanding just because some of its responses are sometimes stupid. The reality is more nuanced. ChatGPT does have real understanding of domains of knowledge that it has correctly modeled in its neural net and from which it can make new inferences in a general way – precisely the definition of human understanding. There are also a great many aspects of the real world for which it has no model and therefore no understanding and appears spectacularly stupid. Those two things are not in contradiction.

One of its strong areas of understanding is language. This is reflected, for example, in its successful ability to understand all the different meanings of “bank” from context per your earlier post (I did the same test with ChatGPT and it, too, correctly inferred all the meanings). It’s also reflected in its success in resolving the correct meaning of ambiguous sentences – the kind of thing that befuddled early AI systems – and in solving IQ test type puzzles about word analogies. I think it’s important to note that these test sentences and analogies are not necessarily things it’s ever seen before; this is no mere “pattern matching”, it’s inferences made from a mental model.

OTOH, it failed a relatively simple puzzle question that required an understanding of spatial relationships. Similarly, as Hossenfelder points out, when asked which of two cities is farther north, it correctly identifies their coordinates but often gets the answer wrong. Superficially this may seem like another example of being really bad at math, but it’s probably more fundamentally due to lacking a conceptual 3D model of the globe.

I’m not seeing where you’ve given examples that are pertinent to the possible emergence of consciousness in a sufficiently capable AI. Can you clarify? Are you suggesting that such emergence isn’t possible?

@wolfpup Yeah, I am a big fan of theirs as well, although I still have issues with Carroll’s acceptance of Everett’s many-worlds. I’m more on Hossenfelder’s side on that. She’s much more skeptical of exotic physics theories.

But it’s a measure of how fast things are moving is that her 2-week-old video is already out of date. GPT-4 is multimodal, and has been trained on pictures, audio and video as well as language. That probably explains how it’s doing calculus and some other emergent stuff.

I love that Bruan Caplan admitted that after GPT-3 failed his econ test he predicted that an AI would not get an ‘A’ on his test until at least 2029. Then GPT-4 did it three months later. The intuition of a lot of academics is failing them here.

This video today that came up in my feed is pretty interesting as well. I don’t quite have the energy to go into detail, but it’s a breakdown of a paper that came out yesterday called “Sparks of Artificial General Intelligence: Early Experiments with GPT-4” and has some interesting observations. Point #5 I found pretty interesting (around the 4 min mark), where GPT-4 produced a proof to a tricky 2022 International Mathematics Olympiad question that was not part of its training set.

(Christ I hate these sorts of clickbaity title cards.)

Thanks for that! Reading the paper now while watching tye video.

Link to the paper:

I wanted to address this before and forgot. GPT-4 didn’t learn calculus in a ‘single revision’. It learned it after trillions of revisions. Remember, the ‘programming’ in GPT-4’s neural net was not done by humans, but evolved. It took a long, long time by supercomputer standards for it to evolve calculus after it learned generalized addition.

As for thr data for the AP calculus already being in there, the researchers say it wasn’t. Anyway, that’s made moot because in the video that @pulykamell just linked, GPT-4 passed the 2022 International Math Olympiad, which was definitely not in its database, which was cut off in 2021. It correctly formulated a proof which had no clear path and which would have required calculus to solve it.

This example from the paper is bizarre, as both answers are wrong:

Q: What is the 13th letter of the word “supralapsarian”?
GPT-4:
The 13th letter of the word ’supralapsarian’ is N.
ChatGPT:
The 13th letter of the word “supralapsarian” is “s”.

While the current version of ChatGPT gets it right:

What is the 13th letter of the word “supralapsarian”?
The 13th letter of the word “supralapsarian” is the letter “a”.

Another strange example is the sqrt(34324 * 2432) question. Both GPT4 (at least the version they were working with at the time) and ChatGPT get it wrong. The current version of ChatGPT also gets it wrong, but its response of 9128 is quite close to the right answer (approximately 9136.52). Interestingly, the product of the numbers and the square root calculation are both wrong, but just slightly wrong. Hard to know quite where any of these systems went off the rails and why arithmetic is so hard for them. It’s almost like they should have a separate subsystem to spin off arithmetic questions to once the question has been parsed and the arithmetic problem formulated.

Of course, GPT-4 can just call an external api for counting.

It makes sense that they struggle with this until they evolve specialized structures to deal with it. Tokens in GPTs are anywhere from a few letters to a word and a half, so they don’t deal well with numbers and individual letters unless they are specifically trained on them.

I suspect that it you kept training it with more examples like that, at some point it will start getting it right every time, because simewhere inside it there will now be a function that when passed a string will return a count.

Well, computers start counting at ‘0’ so…

I mean, in going from 3.5 to 4.0. And there are a lot of things it’d need to learn to be able to learn calculus.

What’s also odd is that, at least according to the out-of-context info I stumbled across, the same version that got a 4 on the AP calc test only got 2s on both of the AP English tests. In 3.5, its English abilities were way ahead of its math abilities… Even if it’s improving, one would expect it to improve in all domains, and stay better at English than at math.

And that is completely unbelievable. Even if it were such a transcendent genius that it was able to rederive calculus from first principles, it can’t have taught itself the conventional notation used for calculus-- That had to be in its training data somewhere. And the vast majority of the information on the Net on calculus is in the form of AP calc classes, which all get their material from the pool of all previous AP calc tests.

Because they don’t do arithmetic. They are just interpolating a pattern per Milliere above.

The changes from 3.5 to 4 appear to be a larger context, multi-modal capability, and some other features. The multi-modal capability might have helped it grok calculus, but it got no specific training in calculus so far as I know. That just emerged after a lot more training.

The various things it can do emerge at different times, and improve at different rates. And the improvements aren’t smooth and linear, but sudden and exponential.

That paper linked above really is eye opening. It probably answers some of your questions.

I’ve given three main examples so far:

  • The fact that physics as a whole, according to our best current theories, is non-computable, in particular as regards quantum phenomena. Thus, it is possible that these phenomena are pertinent to conscious brain activity (see the Posner molecule paper you’ve already commented upon), and consequently, no computation can give rise to conscious experience.
  • That one well-known formulation of a general reasoning agent, Marcus Hutter’s AIXI, is uncomputable in its full formulation, and thus, if general reasoning necessarily includes something comparable, likewise no computation can exactly reproduce it.
  • That in my model, a mental state, faced with the challenge of adapting to environmental changes, faces a question that is undecidable by Löb’s theorem, namely, whether a possibly modified version will generally be right about what it believes (can prove) about the world.

Either of these would preclude the emergence of either conscious experience or general reasoning capabilities (actually intentionality in the last case, with conscious experience being brought in as the solution to that problem, representing a non-theoretical access of a mental state to its own properties).

But of course, that computation doesn’t suffice for conscious experience follows from the much more general fact that computation itself is mind-dependent, hence trying to explain the mind by computation just puts the cart before the horse. But that’s another discussion.

It has to be a valid counterexample. And frankly, that’s not how it works in science, anyway. Things are… fuzzier.

I’m going to repeat the example of Maxwell’s Demon. When Maxwell came up with it, scientists didn’t go welp, let’s pack it up boys, thermodynamics is finished. Instead, it was left as a curiosity for several decades, even though it seemed to be in violation of the 2LoT. Finally, it was realized that the Demon could not exist as originally stated. Its operation created just as much entropy as it erased.

Unfortunately, we may be waiting longer for an answer here, since my claim does depend on P!=NP. However, I don’t think that’s a particularly controversial assumption.

You have exactly zero channel capacity in most cases, and complete predictability in the one case where you get very lucky and guess the state.

There’s no in-between. If there was, you could use gradient descent to work your way to a solution. But this is a one-way function, and you can no more do that than you can work your way to a solution of 3-SAT. It’s all or nothing.

Physics, as it stands, is not understood well enough to say whether it’s one or the other. I’m not so much lobbying as stating that since we don’t know, the world is currently compatible with both outcomes. And so even if one did discover small abnormalities in ostensibly random variables, it would hardly be enough to throw a spanner in the works.

The paper doesn’t seem to address the computational cost of determining whether an N-bit block is compressible or not. If the problem is too difficult–i.e., it takes too long or requires too-large a computer–then causality is safe.

If one-way functions exist, then the only way to compress them is to brute-force your way to the input. And that can’t be done in much less than 2^N operations.