ChatGPT level of confidence

Quite correct. Just as correct as answering the question “what is the source of human intelligence, sentience, and consciousness?” by saying “buncha neurons and connecting synapses”. Absolutely correct, and absolutely useless in terms of any kind of explanatory insight..

No it’s very useful because it helpfully delimits the scope of higher level explanations. Any explanation of the brain’s workings that cannot even in principle be reduced to neurons and synapses (e.g. the homunculus, an non-physical anima etc.) can be greeted with very high levels of skepticism precisely because they do not engage with obvious fundamental facts about the brain.

Similarly, any explanation of what LLMs do that cannot in principle be reduced to “this is the output of a completion algorithm” can be pretty much dispensed with because one thing we know with absolute certainty is that what LLMs do is in fact the output of a completion algorithm.

Or do you think there’s any element of the output which is not the output of a completion algorithm?

(I mean, I say this: maybe you are in fact a Dualist and believe that Mind is distinct from Matter - a perfectly defensible philosophical position - and also hold that LLMs are likewise a conduit for immaterial Mind of some description - a somewhat bolder proposition.)

I can tell you that OpenAI themselves tell you that ChatGPT output must be checked and cannot 100% be relied upon. The enterprise version has been implemented at my company and that came with multiple training sessions with OpenAI were this point was made multiple times.

I’m a tech writer and was told that if we use ChatGPT to say format some content there is a non-zero chance that it will make changes to the content, even if you tell it not to. For example it could easily change the sentence “the user must do x” to “the user may do x” or even worse “the user must not do x” which is a bit of a problem in technical documentation.

Sure, but that’s just a truism that restates your original point in different words and tells us nothing of practical value about the capabilities or limitations of LLMs. What, for instance, does it tell us about how and why LLMs at large scales with tens of billions of parameters start to develop novel emergent properties associated with problem-solving skills and generally considered markers of intelligence? What does it tell us about the spontaneous emergence of special patterns in their neural nets adapted to solving a particular class of problem? What does the principle of “sentence completion” tell us about how – and how well – GPT can interpret images, and generate artificial ones? What, indeed, does it even tell us about how the example question I posed to GPT in post #33 led to the response it gave?

A trivially simplistic statement of principle that has no predictive value has no actual value at all.

You didn’t ask about the capabilities or limitations of LLMs.

You quoted some text generated by an LLM executing its completion algorithm and asked:

The only possible explanations for the words generated by the LLM are founded on the fact that they’re the output of a completion algorithm. Your challenge to skeptics implies that you consider that this particular text output from an LLM can - perhaps even must! - be explained without reliance on a completion algorithm.

Do you believe that this particular LLM text output is in some way, or in some part, or in some degree, not the output of a completion algorithm?

If so, what is it you believe that caused the LLM to generate this output?

First of all, that wasn’t the response I was referring to. I just thought that telling GPT to be aggressively rude and it complying was kind of cute, but the question and response I was referring to was the detailed factual one I cited earlier, up in post #33.

My point, again, is that simplistically saying that GPT executes some sort of “completion algorithm” doesn’t even begin to address – not even remotely – the questions I just asked in the previous post.

Your statement of how GPT works is in a sense a “not even wrong” kind of statement – fundamentally correct but extremely misleading because its actual implications for LLMs at very large scales are unknown. Not even the engineers working on GPT can predict what emergent properties may yet evolve. GPT is not an “algorithm” in any conventional sense of the word, and what is going on under the covers once it’s been trained has never been explicitly programmed and is as much of a mystery as the human mind itself.

There has never been anything like this in the realm of AI before and we should not underestimate it.

Of course it does. At a stroke, it rules out all kinds of potential explanations for how LLMs produce their text. In exactly the same way that explaining the outputs of human minds as solely the results of neurons and synapses rules out other explanations such an eternal soul, or the Mind of Dualism. This is incredibly useful!

If your point is simply that LLMs are novel and impressive technology, I agree. If your point is this technology is best explained by in some way putting to one side the fact that all its outputs are the result of a calculation of the next most probable token given both the weights in a neural net exposed to vast amounts of training data and also the current context of n tokens then… I’m not sure why we’d want to do that.

My take on this: Treat ChatGPT as exactly what it seems to be at face value: a helpful assistant who is very sharp and eager to please and sometimes makes stuff up.

If you had a very knowledgeable human assistant who could carry on deep conversation in many fields, who could organize thoughts, summarize, and write about diverse and complex concepts, you wouldn’t let them go simply because they had an annoying habit of making stuff up without telling you.

The good would outweigh the bad to such a great extent that you would have them working for you all day long, and you would always review and amend their work. You absolutely would never just trust their work if they are writing a legal brief or medical diagnosis for you!

Maybe we’re not in as much of a disagreement as I thought. My point is not that we should ignore this description of how LLMs work, it’s that we should not be unduly influenced by our intuitions about what we think it implies about their limitations, because intuitions can be deceiving.

For example, it’s been said – indeed by ChatGPT itself (which I believe has been brainwashed into taking a self-deprecating stance!) – that LLMs don’t truly “understand” anything, but just “simulate” understanding. What’s the difference? Referring back up to the short example in my post #33, when I ask a fairly sophisticated question and get back a reply that is relevant, detailed, informative, and AFAICT correct, then my question has certainly been understood. Beyond that I have no further interest in philosophical ruminations about the semantics of what “understanding” is supposed to mean.

I’ve no doubt that there are significant limitations to what LLMs can do and that anything even remotely resembling AGI will just use LLMs as powerful natural language processing front ends to other AI technologies, but they sure as hell are impressive today just by themselves!

@minor7flat5 , I fully agree with your post.

Counterpoint: yes I actually would. People making stuff up without telling me is a firing offence.

I think incorrect GPT responses should be regarded as good-faith efforts that unintentionally went wrong, not attempts to deceive. Would you fire someone for that?

I started playing with GPT around the time of its first public introduction because I thought it was a fun toy. I’ve now come to regard it as a genuinely useful resource. I admit that I can’t help but anthropomorphize it as being an exceptionally well-informed friend who is always available – and yes, I do always keep in mind that it might be wrong about something.

If they were to have the habit of stating it as absolute fact, as ChatGTP is wont to do, absolutely.

First time? No. Nth time? Yes.

But you’re speaking in language of intention, deception, good faith etc. And as you say you are anthropomorphising. You said above that you think we shouldn’t be influenced by our intuitions and honestly, I think this applies to your intutions here. ChatGTP is not a friend, it is in fact a complex technology based on completion algorithms whose inner workings are opaque to us and should be treated as such.

Yes of course, because after the first few times I would make sure they got guidance and (if appropriate) training on fact checking. If they kept it up they’d be out the door.

In journalism school even a single fact error in a given piece was an automatic failure. A person can be made to understand the importance of being correct, but LLMs cannot.

I remember a few weeks ago wondering if Victoria Coren Mitchell had ever been a panellist on I’m Sorry I Haven’t a Clue. I was working my way through the archive all the way from 1972. ChatGTP insisted she had been on many times. I tried to get it to specify which episode, so I could just listen to that one now, but it was unclear. Having now listened to every episode with no VCM I confronted it that it was wrong, and yet when I asked again it still insisted VCM had been on.

But as I also said somewhere, the key concept here is utility. Is it useful? Is it right much more often than wrong? Does it source new information that you would never even have known to ask about?

In my view, the answer to all these key questions is “yes”. The all-important caveat is that if you’re asking about something that’s actually important, and not just a fun game, then verify. Verification is much easier than trying to get a primitive search engine like Google to source information.

You can “understand the importance” til the cows come home, but if you genuinely believe something that’s incorrect, it’s still incorrect, no matter how much you believe it.

This is what I mean by describing GPT responses as “good faith”. This is not anthropomorphising. It’s just plainly saying that GPT is not biased towards deception.

Incidentally, IBM’s Watson DeepQA engine has similar issues despite explicitly being designed to do confidence validation. So like GPT, its record is mixed. It was good enough to beat the best humans in a Jeopardy challenge, but the last I heard IBM abandoned plans to use it as a medical advisor because despite the safeguards it was known to give dangerously bad medical advice.

I’m just going to respond to this again in terms of utility, which I think is the all-important concept here.

Suppose someone is in emotional distress – maybe even suicidal – and reaches out to ChatGPT and finds comfort, consolation, advice, and information about access to helpful resources. Where would you draw the line between “a complex technology based on completion algorithms” and a virtual friend?

I’m not saying I’ve ever been in the position or known anyone who has, but I can see the potential. It’s only human to anthropormorphize things with which we have human-like interactions. Toddlers do it with teddy bears, adults tend to do it with intelligent chatbots.

This is why I say that I have zero interest in the philosophical aspects around the semantics of “understanding”. LLMs understand their inputs. End of story.

That would be very high utility.

But will chatGPT always provide those things? The answer is “no, we can never say that chatGPT will do a specific thing.” We can say what it is likely to do, but “likely to do” becomes less and less useful as the importance of the task increases. Your example is one with very high stakes and with potentially horrific repercussions should the distressed person get bad data.

And this thread is, specifically, about reliability. ChatGPT is not reliable and has no capacity to gauge its reliability in real time.

I agree that it’s very impressive, very cool technology. But being really really cool is not the same thing as being reliable.

My TL;DR of that was … I view it similarly to how I viewed the early days of Wikipedia: very cool, often useful, occasionally just flat-out wrong.

Maybe your intent was to use Wikipedia to illustrate the healthy skepticism one should have for these tools.
Regarding usefulness and utility…in my humble opinion, ChatGPT shouldn’t be even mentioned in the same breath as Wikipedia as far as utility. These tools are worlds apart.

Wikipedia is cool for getting handy answers to things and being stunned at how a simple math question suddenly turns into partial derivatives and weird math symbols I have never seen.
But ChatGPT and other LLMs have transformed my work and home life in a way that I can only compare to the birth of the World Wide Web.

Tools like this write reams of application code for me, perform troubleshooting, summarize complex email chains on technical issues. I use them at home for solving all kinds of problems such as how to migrate to a new Mac and how to set up my home automation to test for the sump pump being stuck on–I ask it questions and it gives me detailed answers about exactly how to work with my Home Assistant installation to get that sump-pump script working perfectly.
I tell it a vague outline of a photography video I want to film, walking through a full “school picture day” shoot, with me simply babbling tips and tricks and setup steps, then ChatGPT writes a tight polished outline for my video, with so many bits of advice.

Generative AI is on another level of existence, far beyond tools like Wikipedia.

Just treat it like your know-it-all friend who really does know it all and occasionally misses the mark, with the best of intentions.