ChatGTP et al, the creators don't know how it works?

It’s unclear it you’re being deliberately obtuse or just fundamentally don’t understand what you’re talking about, but you’re wrong.

Yes, I’m familiar with programming. From this familiarity, I know that it’s possible to write programs that set the values of various parameters on their own.

You can understand how something works in general, without understanding exactly how it works for each particular case.

If you wrote a computer program A that generated another computer program B that was 175 billion lines of code, you might have a little understanding of how program B works, but not a lot. There’s no way to read and grok that much code. This is exactly how machine learning or AI works.

After you’re done, you might feel very confident that you could write computer program C that generates computer program D that is even better.

Then if you studied part of computer program B or D, you might be surprised by some of the logic that got generated – taking advantage of patterns you would never have considered. This is the case with AlphaZero learning to play chess – it made moves long rejected by masters.

Companies go to the market for investment all the time not knowing in full how a developing product works. The drug industry does it all the time. They have a promising drug candidate that seems to show some efficacy in limited trials. They have a rough idea that it works by acting on some active site in some important pathway in patient or pathogen, and there is a non-zero chance that the drug could be a very big thing. Prior to COVID Moderna was a minnow in the field with little success to show for its efforts.

ChatGPT is not a great deal different. There are a lot of unknowns. So far the basic architecture of the system has scaled and delivered interesting and valuable jumps in capability at each step. Nobody believes this will go on forever. It is something of a surprise to some that it made it this far. Where the break point will be is hard to say. There will al manner of limits to scalability. Just providing enough useful training data, or funding enough human trainers to build the next iteration may limit things. The claimed market value is getting into silly money already. Clearly enough people are willing to bet (probably other people’s) money on the company. But it could fizzle. There is quite a step from where they are to creating a product that makes a couple of billion a year in revenues. Yet that is where their valuation puts them. Nor are they the only player.

Investing at this level is risky at best. The smart people putting money in know this. They have much lower expectations than the stupid people investing. However sometimes the difference in expectations becomes badly skewed. Lots of people who should know better get hooked on finding the “next big thing”. Theranos anyone?

A friend of mine was offered Amazon stock near IPO. He didn’t take it. However, as I pointed out to him, he would have sold the day he doubled his money. So he isn’t really a loser.

One candidate for a limiter of decelopment is just lack of quality data. At some point we will jist run out of stuff to feed them.

Adding parameters only gets you so far, and that’s related to how much data you have. Having more parameters than data is overfitting, and some of that appears to be necessary for emergence, to give the network more degrees of freedom. But at some point, adding more parameters hits diminishing returns.

We may wind up with AIs hitting the wall at just ‘really smart human’, while also being able to be fine-tunded to be expert in specific subjects. But they will slowly add to their data as they interact with the world. Who knows where that goes. That might depend as much on how stupid we are as how smart they become.

I think we likely have a ways to go here.

For one thing, some data can be synthesized. If we wanted an LLM do be much better at math, one could simply generate endless variations of math problems.

Another is that they don’t seem to have used anywhere close to the actual amount of material out there. ChatGPT was allegedly trained on 570 GB of data. But that’s not even enough to cover, say, the full archive at newspapers.com (300M pages). And that’s largely US-only. A worldwide archive of newspapers must surely be many terabytes worth.

It’s not clear to me that there’s a weighting for high quality sources. Perhaps the LLM could learn this on its own with some training samples, but in any case, surely we want to give more influence to “the classics” (whether Shakespeare or Tolkein).

Finally, we may be able to get it to talk to itself. Without any constraints, surely this would quickly go into the weeds, or at least drift to some non-human language. But with an adversarial design and a means of grounding it with the rest of the corpus, perhaps there are some possibilities here.

And even a shrinking pool of available data:

Flutterward69, writer of My Little Pony-Twilight slash fiction, upwards of 20 readers on Archive of Our Own: “I don’t want my writing stolen to train an AI! I want a way to opt out! Also, I want to be paid!”

So its pretty much an advanced/more sophisticated version of those chatbots like “Evie” and others that the youtube gamers had fun with years ago?

The title of the thread reminded me of some commercials for prescription medicines that said " we don’t know exactly how it works yet so it may or may not relieve your symptoms consult your main GP"right at the end

You then get into trouble with the information content of synthetic data. And the question of what you want the AI to do.

And that is a big question. To be more than a parlour trick, the AI needs to do something people will pay for. Something that there don’t already exist high quality tools for. Teaching it arithmetic is a parlour trick. I have a free app on my phone that does that. Teaching it real mathematics is a vastly harder deal, and we have well developed tools for that, ones that have an element of solidity and traceability in their operation.

Training data for advanced purposes may be a much more difficult question. Feeding it the collected works of romance and crime novels isn’t going to get you much traction. I guess you could train it to write trashy novels. There is probably some chance it could be good at that. But that won’t pay the bills.

The wider question of making an AI that can do something really useful in an economic manner is yet to happen. Lots of people are betting it can be done. Things like applicability as a coding aid are IMHO, overblown. It is probably good at regurgitating boilerplate code. But that just points us at the verbosity of many modern languages and the opportunities for languages with better expressive capability. Any sort of mission critical code you can forget.

Maybe it will put yet another generation of menial wage slaves out of a job.

It occurs to me to mention a favourite story. From Stanislaw Lem’s, The Cyberiad.
The First Sally (A), Or Trurl’s Electronic Bard.

Trurl the constructor, constructs an electronic automaton that writes poetry. Fed with the entire history of the universe to date, in order to understand the nature of things, it writes poetry on demand.

It doesn’t end particularly well for the bard.

I have quite literally never heard a drug commercial end this way; cite? Was it a commercial advertising participation in a medical study maybe?

I am highly dubious of the claim that the FDA would approve a drug when “we don’t know exactly how it works yet”.

Well, my understanding is nobody knows exactly how lithium carbonate works, but it’s a classic mood stabilizer that is still commonly prescribed. I suppose it depends on what “how it works” entails.

Fair - I didn’t mean “we have a complete understanding of the underlying physical processes”, but you’re right that this is one interpretation of “how it works”.

However, lithium carbonate is known to work in specific circumstances, which were figured out through scientific testing (double blind experiments, etc). No commercial for bipolar medication is going to say “since the chemical process by which lithium carbonate treats bipolar symptoms is not fully understood, consult your doctor before taking this medicine”.

I was pondering that also, but I think this only works if you measure intelligence in terms of attained works rather than capability to do stuff. Assuming we haven’t completely stagnated, humans have more capability potential than is expressed by what they have already attained; we’re not limited by what we have done or seen, but by what we can do

If AI works the same, then it’s not necessarily limited to what we have trained it with.

It is used for homeopathic medicines and devices like copper bracelets. It’s a non-claim that gets around the FDA.

There are at least 3 things we can mean by “how it works” in this context:

  1. How does natural language processing and deep learning produce useful answers?
    This is the one we know the best. We understand these algorithms in detail, mathematically.

  2. How did a given deep learning system generate a specific result?
    This can be difficult to know, because knowing how a result was generated is a bigger question than just knowing the result, and has lagged algorithms for just doing useful stuff.
    Also, what kind of answer are we expecting? Surely not a sentence of natural English. We basically need to invent visualization and statistical descriptions that can usefully, and ideally intuitively, describe the training and processing of a neural network.
    For sure, this has already started, but if you have a good idea for how to do this then let me know how I can invest in your startup :slight_smile:

  3. Some analyses that I have seen (video) have suggested that Chat GPT 4 exhibits behaviours that the creators can’t explain, like learning to use external software.
    Now, this might be sensationalism, or marketing or whatever. But it nonetheless it reflects a third kind of “how it works”. If it’s even 10% true then wow – things might be about to heat up pretty fast.

I believe it’s true in the sense that they know the code underneath but don’t completely understand how it learns. Robert Miles did a video on GPT3 which learned basic math by consuming massive amounts of human written text; something which was very unexpected.

Edit: It’s an interesting video but jump to 11:10 if you just want to view the math portion.