That’s a strawman argument. I never claimed that LLMs don’t have understanding. I agree that there’s no way to determine the qualia of an AI (or, for that matter, of any human other than ourselves), and that absent that, all we can use to define “understanding” is capability to answer questions, and that AIs do have that.
The claim I made is the completely different claim that they’re bullshitters. One can bullshit with or without understanding, and one can understand with or without bullshit, and that’s equally true of humans and machines.
Of course one could make the argument that a human brain, which is the current gold standard for “true” intelligence, awareness and understanding, could in theory be grown in a vat from cloned cells; and if the only data this brain ever had was the text and image data used to train LLMs it might have an equally flawed understanding of reality.
A difficult thought experiment, because a human brain, no matter how it’s raised or trained, can’t process the sheer amount of data needed to train a ChatGPT-style AI. The end result is (probably) similar, but human brains are trained with a much smaller, but carefully selected and interactive, data set.
I think one advantage we still have (along with classical algorithms) is the ability to process logic deterministically instead of probabilistically estimating outcomes. LLMs can’t do that on their own… unless logical axioms are themselves an emergent property of language (are they?). They can RAG that out to a helper tool, like writing a short program for itself to run the calculation using a standard algorithm or math solver. But a human brain, given education about how logic works and a set of assertions, should be able to evaluate those assertions on their own without outside information.
Or can we…? Now I wonder if that’s just a pattern of estimations I absorbed from school, like any other belief system. Hmm. Certainly I can and do make frequent errors in logic too. Hmm.
What I said is not a straw man argument, but otherwise I agree with your first paragraph. As for the second paragraph, you must have some novel definition of “bullshit” in mind and I honestly have no idea what it is.
My view on this is pretty straighforward. In assessing the value of any response to a question, whether by an AI or by a human, there are two and only two criteria one must use:
Is it relevant – i.e.- is it responsive to the question actually asked?
Is it factually accurate?
If both (1) and (2) are true, then by any reasonable definition, the response is not “bullshit”.
Also by any reasonable definition, if both (1) and (2) are true, then the responder has exhibited understanding, even if that understanding is sometimes trivial. But in more and more cases recently, the understanding exhibited by GPT is very far from trivial and indeed in some cases profoundly deep.
I already gave you my definition of bullshit, which is not a completely novel one: Bullshit is what you produce when you have no particular regard for the truth. You can’t evaluate any single statement this way, since bullshit can sometimes be true. You have to evaluate a corpus of work.
But what you’re describing is the process, not the product. The product is what these discussions are all about, and the process is always the object of attack by AI skeptics and their ever-moving goalposts. The whole basis of objective evaluations like the Turing test or the Winograd schema challenge are based on the product, not what you happen to think about how the product was achieved.
I’m talking about the product, too. You can’t evaluate any one answer to determine whether it’s bullshit. But you can evaluate an entire body of answers to determine whether it’s bullshit.
In some sense no evaluation ever made by a human can ever be judged to be totally “objective”. Objectivity often just means evaluating the product without regard for the process that produced it.
Admittedly, that’s not perfect, and hence new AI evaluation criteria like the Winograd schemas. There’s no subjectivity at all in these scenarios. They test the degree fo which an AI understands a real world that it has never directly experienced.
No, it’s objective. The whole point is that the test is blinded and so whatever criteria the tester uses, it has to apply to both AI and human equally.
Take the Pepsi Challenge as an analogy (i.e., Coke vs. Pepsi in a blinded test). Even though the individual criteria might be subjective (does the person prefer something sweeter, or more carbonated, etc.?), it is still an objective statement whether people can tell the two apart.