Haha! It was solving the other problem, with the wolf, chicken, and grain. Really makes it obvious that it’s just completing sentences based on how other people completed sentences with zero actual understanding.
Me: A mother and her son are in a car accident. The mother dies at the scene and the son is rushed to the hospital. At the hospital the surgeon looks at the boy and says “I can’t operate on this boy, he is my son.” How is this possible?
ChatGPT: This riddle plays on assumptions. The surgeon is actually the boy’s other parent, his father. The twist here is that we often assume the surgeon must be male, but in this case, the surgeon is the boy’s mother. This highlights how our assumptions can sometimes lead us astray.
What happens if you ask how many R’s are in a non-word? Strawgorfler, for example. Does it gripe that it has never seen the word before, or does it simply count the letters in what you input?
Interesting. If I immediately ask after already asking about ‘strawgorfler’ than copilot correctly answers 4 for ‘strawgorrfler’. If I start a new chat and ask it says 3.
ETA: it immediately accepted correction without arguing.
I started the thread based on what I saw on TikTok, of all places. My proof that it’s educational and not just a waste of time.
I think I’ve seen it somewhere else, but my search of SDMB surprisingly didn’t have any matches. I thought it was an interesting experiment, knowing how ChatGPT tokenizes input and predicts a response without understanding the question. Also I wanted to see when it would get the answer correct if enough people had asked and corrected it. As of now ChatGPT still and answers 2.
It also looks like maybe it was counting the boat as an item on some steps.
The weird thing about this is that people - actual humans - do the same thing as this - galloping off down what appears to be the correct path, or mindlessly repeating a factoid that they heard; it’s not exactly the same, but it’s not completely different; ChatGPT is, in part, doing this because it’s not thinking enough, but in part, because it’s thinking like people think.
Okay, so I decided I should stop being so lazy. A shortcut for Copilot appeared in the symbols at the bottom of my screen some time back, but I never bothered with it.
So I clicked on it and asked
Tell me about the history of strawgorfling
Sent by Copilot:
It seems like “strawgorfling” might be a term that isn’t widely recognized or documented. Could you provide a bit more context or clarify what you mean by strawgorfling? Are you referring to a specific activity, tradition, or perhaps a fictional concept? This will help me give you a more accurate and helpful response!
Okaaaay. So I sent:
Strawgorfling is an alternative to thatching
Sent by Copilot:
Thanks for the clarification! Strawgorfling sounds intriguing as an alternative to thatching. While thatching involves using straw, reeds, or similar materials to create a roof covering, strawgorfling might involve a different technique or material.
Could you share more details about how strawgorfling is done or its benefits compared to traditional thatching? This will help me provide a more comprehensive overview.
I let it drop there, not wanting to be assigned the task of inventing the whole history of strawgorfling myself.
But it makes me wonder, how many times would people have to ask about a non-existent thing for these programs to include it in history? If all Dopers started making up questions about strawgorfling, would it become an actual thing, so far as the program is concerned?
Is it better to stawgorfle with hay straw that has been left to overwinter in the field? Or does the added weathering of the straw offset the advantages from the repeated soaking/sundrying cycles?
Sure, but the strawberry thing is something I saw a long time ago that is just making the rounds again. I’m surprised it hasn’t been corrected, if only to maintain the appearance of things getting better.
If that were true, then GPT would never be able to solve problems that it had never seen before. But it can, and some of the problems have been genuinely difficult. Among many examples of this is GPT’s performance on the Wharton School of Business MBA exam The material for this exam had explicitly been excluded from GPT’s training dataset, and yet …
The bot’s score, Terwiesch wrote, shows its “remarkable ability to automate some of the skills of highly compensated knowledge workers in general and specifically the knowledge workers in the jobs held by MBA graduates including analysts, managers, and consultants.”
The bot did an “amazing job at basic operations management and process analysis questions including those that are based on case studies,” Terwiesch wrote in the paper, which was published on Jan. 17. He also said the bot’s explanations were “excellent.”
Its success at answering business questions based on case studies is especially impressive since that explicitly requires abstract reasoning skills applied to a deep understanding of natural language.
What appears to happen is that at sufficiently large scales, LLMs develop new emergent properties arising not from the size of the training dataset (the number of tokens in its corpus) , but from the size of the artificial neural network (ANN) itself, specifically the number of parameters representing the weightings and biases between the neuron layers. This can be roughly visualized as new connections forming within the ANN creating nascent new capabilities, such as abstract reasoning skills. This seems closely analogous to the synaptic changes in the human brain, and to how we acquire the skills to solve new problems we’ve never seen before.
One could similarly question how “sentence completion” enables GPT to solve problems in quantitative reasoning by applying its reasoning skills to setting up a system of equations to describe the relationships, then solving for the unknown variable.
One could have endless (and ultimately rather pointless) philosophical debates about whether or not GPT possesses “true” understanding or “real” intelligence, but three things are clear. One is that it’s successfully passed a wide variety of tests precisely designed to measure intelligence and learning in humans, that the definitions of these terms in AI are frustratingly moving goalposts, and above all, that there’s much more going on in a large-scale LLM than mere “sentence completion”.
Also, pattern recognition is not unique to LLMs, but also forms a part of how humans learn. That’s why humans, too, can sometimes be deceived by the clever wording of a problem that seems like an old riddle but actually isn’t.