ChatGTP et al, the creators don't know how it works?

I doubt the ‘we don’t really understand how it works’ pitch is the one he presents to the backers when he’s asking for another billion dollars.

Its just good theater for a PR session.

Sounds a lot like “we need to pass this bill so we can know what’s in it”.

“Give us teh moniez so we can figure out what this AI can do.”

‘Push the button and see what happens’

No, it’s a feature that’s pretty much inherent to neural networks.

I think the CGP Gray video you linked above pretty clearly explains the difference.

No it’s not. It would be difficult to size a Sigmoid or program back propagation if you didn’t know how it works.

I’ll refer you back to this post by @engineer_comp_geek that explains what you are missing:

But I suppose I was not specific enough:

It’s inherent to sufficiently large neural networks, not neural networks in principal.

I’m not missing anything. The brilliant engineers who designed GPT controlled the parameters of every node in the system. They specified the activation window and the back propagation of every node in the net. It was not anything as ethereal as the simulation of neurons. They are not related.

Its an engineering problem with a solution. The ‘gosh we don’t know how it works’ is just good theater.

This is the key. Of course we know how it was built since we built it. We know how it was trained since we trained it. But there is not remotely any practical way to know how or why ChatGPT produced any specific response. There is no way to predict ahead of time how well it will work. Perhaps most intringuingly, there is no way to predict the spontaneous emergence of new problem-solving skills on which it had never been explicitly trained, which can appear quite suddenly purely as a function of scale.

Then why did they invest a billion dollars to train it? Just to see what it would do?

In a nutshell, that’s pretty close to the truth. I mean, this is a research project, and that’s pretty much the nature of research. The justification for research funding is generally in the form of some evidence that it will be productive, but there are never guarantees. The Large Hadron Collider cost billions, and the justification was basically that cool things will likely happen at unprecedented levels of particle energy, and thus new discoveries will be made. The justification for the OpenAI research is that intelligent machines capable of natural language conversation and reasoning have many commercial applications.

Yes, and they had to convince somebody that they had a reasonable probability of success. Since this is a commercial product, I don’t believe it was funded by the American Altruists Association.

This is a strawman argument. You’re not responding at all to what @wolfpup is saying.

To demolish your strawman, OpenAI had done work on previous models (ChatGPT 2, for example) and showed that increasing the size of the neural network and amount of training time it received resulted in corresponding increases to the performance of the language model. Graphing the performance vs resources available to the model, they were able to show that the largest models they could produce were still improving rapidly with size, with no sign of diminishing returns.

They used this as evidence that increasing the size of the model further would result in even better results, and thus secured funding for ChatGPT 3.

Thanks for making my argument. They knew what they were doing.

My understanding is that the probability of success was largely premised on smaller, simpler models that showed the language model concept to be promising. But if you’re trying to suggest that the researchers or funding providers knew exactly how ChatGPT would perform in a massively scaled-up version, that’s absolutely not the case. Even today the research continues and new surprises continue to appear.

That’s not remotely the same thing as knowing how the assembled and trained system would perform, or even why it responds the way it does. For all anyone knew, it could well have been a total failure, proving only that scale alone doesn’t significantly improve cognitive skill. Instead it proved the opposite.

Well yes, that’s the nature of development. I’m not sure what the argument is here. You are familiar with R&D. I am familiar with R&D funding. We both know that there is a large degree of risk in development and total success is a pleasant surprise. We also know that there is an element of theater in presenting to the media.

Sometimes that happens, but sensationalism is usually the creation of the media itself, not the researchers. I haven’t seen anything like that coming out of OpenAI. That there are mysteries and surprises in ChatGPT like the emergence of novel skills isn’t “theater”, it’s fact.

That… is not the point I was arguing against, whatsoever.

It’s like the difference between understanding the mechanics of evolution, and understanding what individual genes do, versus being able to program with DNA the way we do with computer code. Or having an understanding of both neurons and psychology versus understanding the complete mechanics of the human brain.

We understand the mechanics of neural networks, at an individual neuron level. We also have some ideas of how they evolve over time when trained, in a general sense.

That doesn’t mean that we “understand how they work”, any more than we understand how conciousness arises in the brain or how to read DNA like a book.

Absolutely right - the individual nodes are very easy to understand. The thing that’s difficult to understand is what they’re actually doing together when ‘together’ means 175 billion of them are interacting with each other.

The engineers at OpenAI had strong reason to believe that interesting new capabilities would emerge when they went from ChatGPT 2 to ChatGPT 3, and from 3 to 3.5, and from 3.5 to 4, because new capabilities had emerged every previous time when they had made the network or training dataset larger. But they didn’t even know what those new capabilities would be. And they certainly didn’t know how those capabilities would work.

This is quite simply factually incorrect, and I don’t know why you keep saying it. There are way too many parameters for humans to even be capable of having specified every one. The system set its own values for those parameters, in the process of digesting the training data set.

Read what you quoted. Are you not familiar with software programming techniques?