The next page in the book of AI evolution is here, powered by GPT 3.5, and I am very, nay, extremely impressed

Happy New Year!

Summary

Thank you! Happy New Year to you as well! Do you have any questions about the new year that I can help with?

Give me some interesting facts about the number 2023.

Summary

Here are a few interesting facts about the number 2023:

  • 2023 is the smallest 3-digit number that is the sum of two squares in two different ways: 23^2 + 4^2 = 2023 and 16^2 + 15^2 = 2023.
  • 2023 is the sum of the first 43 positive integers (1 + 2 + 3 + … + 43 = 2023).
  • 2023 is divisible by 3, 7, and 11, but not by any other primes.
  • The digits of 2023 add up to 9, which is a multiple of 3.
  • In binary, 2023 is written as 11111011111, which contains three pairs of consecutive 1s.
  • The product of the digits of 2023, 202*3 = 0, is the only 3-digit number with this property.

I hope you found these facts about the number 2023 interesting! Do you have any other questions about this number or anything else?

No thank you. You’ve given me enough “facts” for now.

Summary

You’re welcome! If you have any other questions in the future, feel free to ask. I’m here to help.

Wow, you’d have thought that it would accidentally get at least something right. I mean, I guess that it’s true that the product of the digits is 0, but it’s still not a three-digit number, and there are lots of three and four digit numbers with that property.

(for the record, 2023 = 71717. Students involved in math competitions will typically learn the prime factorization of the current year, because math competitions often include questions involving that number, and the prime factorization is something useful to know about a number.)

This blew me away:

That is a completely AI generated podcast. The first person you hear is NOT Joe Rogan, although I sure couldn’t tell.

The whole podcast was created by scraping Arxiv for new papers by following an account that tweets out the new stuff, then parsing the papers for the content without all the other stuff, then having GPT convert the paper into a dialog between two people, then giving that to another AI which creates the voices and does the podcast. No humans involved at all.

It’s not perfect, but as a sign of things to come…

What a strange world we live in where we have convincingly human AIs, but the easiest way to call them out is to ask a basic math problem. Science fiction lied to us.

Wow, indeed! This chatbot is amazing in many ways, but it really gets totally hung up on the 2023 question. When I asked it the same question just now, some of the interesting “facts” included these, all of which are wrong – a stunning failure by an AI that can otherwise do truly amazing things – there must be some systemic bug causing this:

  • It is the smallest three-digit prime number.

  • 2023 is divisible by 3, 6, 7, 11, 13, and 21.

  • It is the only three-digit number that is divisible by the first six prime numbers.

In many ways, yes. Ironically, when I asked ChatGPT directly if it thought it could pass the Turing test, it replied “probably not” and went on to explain that its design mission was to provide informative answers to questions, and not to carry on a general human-like conversation.

I dunno, I’ve been thinking about how prescient Robert Heinlein was in describing the first sentient computer, Mike. Mike was described as being simultaneously brilliant, but amazingly dumb in some areas (at first). And he made bad errors, but intentionally mostly. His intelligent wasn’t designed, it emerged.

Heinlein described Mike as ‘waking up’ when the number of connections grew to a certain point, representing a neural net of sufficient complexity. He also explained that Mike was trained on large data sets that were classified. In fact, Mike had problems with humor, so he printed out hundreds of jokes that Manny had to classify as ‘funny’, ‘not-funny’, and ‘funny-once’. That too is similar to how we train AI today. Mike also had the ability to create voices and faces and create his own fake character who could talk to people on TV.

That for a book written in the early 60’s and published in 1966. This was just a few years after the first AI conference, and about the time that the first artificial neuron, the ‘perceptron’ was created. At the time, other futurists and AI researchers were wildly wrong about the direction of AI. Neural nets were actually frowned upon as a dead end, leading to stagnation in AI research for a long time. Remember Expert Systems?

And when you ask it if it’s any good at providing informative answers to questions, it tells you that its purpose is to carry on human-like conversations.

@wolfpup , I don’t think there’s a bug causing that. I think it’s just an artifact of what it’s designed to do, which is mimicking humans. Statements like that closely resemble statements that humans would make about what makes a number interesting. The fact that they’re wrong statements is, from that perspective, irrelevant.

I think that the best way to view this thing is as a computerized version of Cliff, from Cheers. Ask him any question, and he’ll quickly give you a plausible-sounding answer. And he’s pretty good at trivia, so often, the quick, plausible answer he gives you will be correct. But he’s also good at bullshitting, so if he doesn’t know the answer, he’ll answer anyway, and it’s very difficult to tell when he’s answering from knowledge and when he’s bullshitting.

ChatGPT is meant to chat. I’m pretty sure it’s completely incapable of actually processing math problems.

A model designed to query something like Wolfram Alpha would have more success. But ChatGPT is just compiling natural-sounding responses based on its training set.

It’s not surprising that a system like this would be smart in some ways and dumb in others. After all, our own definition of smart and dumb is totally arbitrary, and relative to our own human capabilities. I just find it a funny contrast against the sci-fi trope of an AI that can’t do basic stuff like use contractions but can nevertheless perform computations at the level of a normal computer.

They can fix it easily enough, by providing access to “organs” like a compute module. And while this might get into dangerous territory, broad access to the internet would also be incredibly useful.

It’s clear now that strict rule-based systems are obsolete for pretty much any task, whether chess or vision or language processing.

What I’m really curious about is what will happen in the field of linguistics. ChatGPT is almost perfect at understanding language. I’m not sure I’ve ever seen it misunderstand a prompt. But the NN did not have language built in at the beginning. There must be lessons here on how language is acquired.

The language model that they’ve trained is clearly capable of much more than chatting, though. It can build totally functional computer programs, for instance.

I think the problem may partly lie in its tokenization scheme. Every word is a token, instead of every character. That means it’s difficult to break numbers, formulas, etc. into individual parts. It can’t see the number 2023 as four digits at all, just a token that was associated with some things during the training.

Another problem is the limited processing power. It actually does a better job if you ask it to go step-by-step towards an answer, because every new output token gives it a little more processing power to apply to the answer. If instead it could “think for a while” on a particular answer, it would probably do better.

I asked ChatGPT to tell me the difference between a ‘funny-once’ joke and one that is funny all the time - the concept Mike had trouble with.

ChatGPT:

I’ve asked it several different questions about that, including whether it’s any good at providing informative answers, and whether it could pass the Turing test. As I noted above, it thinks it probably could NOT pass the Turing test. The impression I get is that its design center is to be a question-answering system supporting interactions in natural language; thus its intended mission (which it’s mentioned many times) is to provide useful factual answers to an open-ended variety of questions. Indeed, if you type in an emotional type of non-question comment, such as “I am sad and lonely”, rather than trying to emulate human-like empathy, it switches into professional-therapist mode and dumps out psychological guidance and lifestyle suggestions.

In that sense, I think the best analogy for ChatGPT is a comparison with DeepQA, the IBM engine for Watson.

If my previously stated assessment is correct, the usefulness of ChatGPT’s real mission rests on its ability to deliver accurate information, which in turn means a competent level of natural language understanding. It does not, however, appear to be intended to mimic human conversation except in the context of answering questions.

“Bug” may not be quite the right term to describe its poor performance on the 2023 question, but from this respect, it’s a gaping hole at a very basic level in its otherwise impressive question-answering capabilities.

I must say until seeing this, I had no criticisms of any of its other responses.

ETA – for what it’s worth, I asked ChatGPT directly:

I and some other posters are discussing what your mission is. Can you help clarify what it is that you’re primarily designed to do?

As a language model, my primary function is to generate text that is similar to human language and to assist users in generating human-like text for a variety of purposes. I was trained on a dataset of billions of words and can generate human-like text in response to prompts or questions from users. I can be used for tasks such as translation, summarization, content creation, and more. I am not conscious and do not have my own thoughts, feelings, or desires, so I do not have a “mission” in the same way that a person might. I exist to carry out the tasks for which I have been designed and to assist users in generating human-like text.

My request: Please write the opening paragraph of a story in the style of P.G. Wodehouse

The result is interesting; it has some of the characteristic elements but they are thrown together rather haphazardly. Jeeves is a valet and does not own a country estate (usually referred to as a “country house”) and for that matter, neither does Bertie, although many of his friends and relatives do and many stories are set there. Bertie himself lives with Jeeves in a London flat.

It was a beautiful summer’s day, and I was lounging on the lawn of my country estate, sipping a cold lemonade and enjoying the sun’s warm rays on my face. Suddenly, my peaceful reverie was interrupted by the sound of approaching footsteps. I turned to see my old friend, Bertie Wooster, striding towards me with a look of consternation on his face. “Jeeves, old man,” he cried, “I’m in a terrible pickle. I’ve gone and gotten myself engaged to three different girls, and I don’t know how to get out of it!”

That’s hardly fair. There are lots and lots of tasks that computers do, much better than humans, and to much practical effect, by strictly following a simple set of rules, and those rule-based systems can be expected to remain in use for the foreseeable future.

We might be nearing the limits of what can be done by strict adherence to simple rulesets, and so to significantly expand what computers can do, we need to move to other paradigms. But that doesn’t mean the rule-based systems are a dead end.

This is not the case, because ChatGPT is “lying” to you about its purpose.

ChatGPT is not designed to provide factually true and meaningful answers to questions. It is designed to minic human conversation.

If you asked a human whether an AI can pass a turing test, based on the sample of responses in ChatGPT’s training database. (Which, notably, contain nothing written after ChatGPT existed), the human would likely answer “no” with reasoning similar to that which ChatGPT used. But ChatGPT doesn’t ‘think’ anything about its own capability to pass a Turing test. It’s just mimicking a likely human response to the question.

That’s why it fails the math questions. Its successfully responds with something that looks like a human talking about math, but it doesn’t analyze whether the answer is correct or not in any way.

I’m not sure I agree. It didn’t just say “probably not”, it gave concrete and plausible reasons for it, namely that it wasn’t the purpose for which it was designed.

It’s capable of amazing feats of information retrieval on even obscure issues of science and technology (and probably many other subjects) – hence my comparison to Watson – although I agree that it has no way of verifying the accuracy of anything in its database. Its blunders on the 2023 question are frankly a mystery to me, though. It obviously isn’t doing a static database lookup for the properties of any possible number, so I’m not sure what it is that has gone off the rails here. Kind of reminds me of the incident with Watson on Jeopardy; asked which American city has airports named for a World War II hero and a famous World War II battle, it answered “Toronto”.

I should have qualified that better. For open-ended, largely-unsolved problems (like the examples I gave), I think rule-based systems are largely obsolete. There are of course many simple systems that are completely understood and will continue to be rule-based. Those systems will be especially favored for anything safety-critical, where having predictable behavior is an advantage (though for systems like self-driving we just won’t have a choice).

Its answers for any question about itself are all over the place, which really isn’t much of a surprise, since all of its training comes from an era before it existed at all. I think it must interpret any question about “you” as being a question about natural-language AI programs in general, and so answers based on the AI programs that existed at the time its training database was compiled. And there were many such programs, that worked in many different ways, so its answers will likewise vary.

Not to me. It clearly can’t answer the question via pure information retrieval, because prior to 2023, very few people would have been asking “what’s interesting about the number 2023?”. The facts aren’t out there to be found. And it can’t answer by constructing mathematical facts ex nihilo, because while there are some AIs (and some NIs) programmed to do that sort of thing, it’s not one of them. So all it can do is put together some bullshit that resembles the things that humans say about other numbers. Which it does. Those “facts” do resemble the things that humans say when you ask “Tell me some facts about the number ____”.

The specific “I’m just a humble AI chatbot with no ability to do X” routine does appear to have been manually fed to the bot, but it will trot that out for all sorts of reasons. With the right prompt, it will tell you that it’s just a humble AI chatbot with no ability to do what you ask - right before delivering on the request anyways.