The Miselucidation of Whack-a-Mole

“Posting crap” to a math question of my own making? That is my crime?

The AI provided a correct answer. I posted a wrong answer by mistake. I admitted to it and I apologized.

All I am learning here is to never admit to a mistake or using AI or any other tool.

Huh? How was it a question if your own making? The discussion was about speed and acceleration of magpods, and you volunteered

Which seems like you are attempting to answer other people, not posing your own question.

Gotta be honest, that’s a calculation i would have expected Bard to get right. But it didn’t. The moral is that you really can’t rely on current gen AI to do even simple problems unless you know enough to check it.

So yeah, your “crime” (and come on, this isn’t a crime, it’s a stupid stumble on your part) was to post crap. And you did it because you trusted AI to be something it isn’t. At least, not yet.

If you’d just posted that, by itself, and not admitted to using AI, you would still have posted misleading crap. And if you refused to admit your mistake, you’d be taking worse heat than you are. There would have been an immediate pit link about you (confidently posting crap and refusing to admit he’s wrong) not a link to a thread bitching about generative AI. And the only reason you ended up in the pit is because you stood up for AI doing something it isn’t good for. That is, you are in the pit for posting a different type of crap and sticking to it.

Bolding mine. It was not a question asked by the OP.

It did get it right (near enough). I posted the wrong answer.

Also, it was post 300-something posts into that thread. I thought there was some leeway that deep into FQ. Am I wrong? I mean, I know we always should be posting reliable info but I was a looooong way from answering the OP directly (as was everyone at that point…heck…I was the one who revived that thread…the original post was in 2013).

I think I deserve a break on a ten year old post and I admitted to my mistake almost as soon as it happened and I used AI to do math…not to write my posts here.

It really didn’t. It posted a a lot of misleading glurge.

And what “break” are you looking for? You didn’t get a warning. I don’t think you even got a mod note. Instead, you sparked a discussion about why generative AI is not currently appropriate as an answer to a factual question. You’re taking heat now for refusing to acknowledge that there were problems with your post and those are because you relied on Bard in an area where you didn’t have enough knowledge to proofread its answer.

Being pitted for it.

You’re not being pitted for making that post. You’re being pitted for sticking up for AI to do stuff that it’s not ready to do.

If you’d said “oops, i thought Bard would do that right, but i guess i was wrong to rely on it”, you wouldn’t be in the pit.

It gave the right answer.

It was my fault I posted the wrong answer. Which I have said…many times.

Naw, don’t take the blame. It was extremely misleading. As it tends to be.

This now sounds like the standard marital spat where they’re not fighting about whatever started it. They’re now fighting about having fought over whatever started it.

Yáll do whatever you want, but I’ve added my 2 cents earlier and have now had all the entertainment I can stand. Enjoy your weekend. Out here.

You’ve been directly instructed by a mod not to use AI in factual questions. This quote sure sounds like you plan to continue doing so, only to lie about doing so, directly or through omission.

Were I a mod, your posting privileges would be under review. As it is, it seems reasonable to flag your posts in FQ from now on, as you are saying here that you will continue engaging in proscribed activities but won’t admit to doing so.

You continue to insist that this is the case even though it has been pointed out to you that both of the results you got from Bard were wrong, and even self-contradictory as indicated above. Neither the first response you posted, nor the supposedly corrected second response provided the correct answer for the question of distance or time, while two posters who actually worked through the problem both provided the correct answer for time. The ‘reasoning’ provided by your chatbot indicated that it didn’t actually comprehend the question and was just producing grammatically and syntactically correct gibberish. (For that matter, even the GPT-4 response that @Sam_Stone provided still gave an incorrect answer despite getting the basic logic of the problem correct because it didn’t grasp the basic procedure of rounding, indicating that it still didn’t have any understanding of what it was doing and was just following a statistical algorithm of answering this question in a form similar to what it was trained on, using the logic implicit in language to perform some simulacrum of performing basic algebra.)

Your mistake wasn’t posting an erroneous response; it was:

  1. Using a chatbot to do the ‘thinking’ of formulating the (trivial) problem;
  2. Relying on the result from Bard to be correct without doing the minimal amount of effort to check it or even having the sense to realize that the answer was clearly wrong by orders of magnitude;
  3. Continued instance that the result from the chatbot was correct and that you just clipped the wrong part of it, even after it was shown that all of the different results it provided were errors;
  4. And interpreting all criticism as being some kind of elitist cabal intended to keep you from participating in threads where you aren’t an ‘expert’ even though the vast majority of posters (myself included) are not any kind of expert in the myriad of topics that are discussed, nor has anyone demanded more than a basic verification of purportedly factual information.

You have made the mistake that a disturbing number of people are doing in assuming by default that results from any LLM are authoritative and correct even though they are just the result of processing natural language prompts through a statistical model to provide a syntactically cromulent response without any actual fact-checking mechanism. This is understandable because what these models are being trained to do is to provide authoritative-sounding responses to prompts in order to ‘fool’ users into the belief that they are interacting with an actual cognitive system with a theory of mind about the user. Where you fell off a cliff was in your continued, vigorous, obtuse insistence that Bard gave the right answer even after it was thoroughly demonstrated to be wrong (@Chronos even showed his work) thereby sidelining the actual discussion.

The appropriate response would be to acknowledge that current ‘AI’ chatbots, and LLMs in general, are not the right tool to perform even simple calculations reliably and cannot be trusted to give a correct result. Instead, you have doubled and tripled down on insisting that it was right and you just copied the wrong bit, and that you are being persecuted for being ‘honest’ which is risible given that you essentially plagiarized your completely unsolicited initial response from Bard instead of doing a simple multiplication problem yourself. You are an exemplar for how increasing reliance on these unvalidated and unreliable tools is making us all dumber and more prone to bad decisions based upon false confidence in unverified and often wrong ‘facts’. That is your ‘sin’.

Stranger

Absolutely right. I really don’t know why anyone is making that assumption?
Even a little experimentation with these things easily shows that they systematically hallucinate complete nonsense.

What is disturbing is that they are being incorporated into search engines, so you may not even realize that the result you get was chatbot-generated. A trend of which I heartily disapprove.

Absolutely correct again. It doesn’t take much experimentation to verify this.
Wolfram Alpha is a much better tool. It tends to err on the side of saying it doesn’t understand the question, rather than making up fictions.

Whack-a-Mole’s apology reminds me of someone admitting that they tried to shoot someone and so they’re apologizing for missing.

Prometheus was punished by Zeus because he stole fire to give to mankind. He was chained to a rock in the Caucasus Mountains, and every day an eagle came and ate part of his liver. Each night, his liver would regrow, which meant he had to endure his punishment for eternity.

Like one of the saying of Lazarus Long: “Beware of strong drink. It can make you shoot at revenuers… and miss.”

Well, I’m not knocking you, friend.

But it might be a good idea if we all adopted a voluntary code of conduct rule which says that if we include any information or text which is chatbot-generated, we mention this in the post?

Though as I just noted, with the heinous trend of incorporating the things into search engines, we may not always know… :frowning:

Sometimes one looks for something via a search engine, and finds a page that is bot-generated (but not by the search engine itself). One can usually tell because it’s full of bullshit.

That can certainly happen. But the operative word is ‘usually’?
It depends on whether it’s a topic one has enough knowledge about to make at least a somewhat educated judgement about the plausibility?

If not, it is probably best not to repeat it (though tell that to the Internet Babblers of the past few decades)… :wink:

I’m going to use this as a jumping-off point to make some general comments, not directed at anyone in particular. Excuse this slight digression, but this is the Pit, after all, and this is a real hot button with me.

LLMs have generally been very bad at arithmetic, especially earlier incarnations, although they’ve been getting much better. The salient question isn’t why they were so bad, it’s how they were able to do arithmetic at all, since it was never explicitly part of their training. The answer is that arithmetic capabilities apparently evolved as an imperfect emergent property when the scale of the LLM became sufficiently large.

I must admit that the disdain with which LLMs are held by some really bothers me, because it’s not justified. It seems to be based on a hazy understanding that they’re “just” stochastic token predictors (essentially, glorified sentence completion engines). The problem with this dismissive judgment is that our human intuition cannot fathom the impact on these systems’ behaviour when their operational scale becomes extremely large. One measure of scale is the number of so-called “parameters” – essentially, fine-tuned weightings resulting from training that help it improve not only coherent language generation but the relevance and accuracy of its responses. GPT 3.5 has around 175 billion parameters; GPT 4 has well over one trillion. The direct result of this literally inconceivable scale is that new and often unexpected intelligent behaviours occur spontaneously in the form of new emergent properties.

To those who claim that ChatGPT and its ilk don’t actually “understand” anything and are therefore useless, my challenge is to explain how, without understanding anything, GPT has so far achieved the following – and much, much, more, but this is a cut and paste from something I posted earlier:

  • It solves logic problems, including problems explicitly designed to test intelligence, as discussed in the long thread in CS.
  • GPT-4 scored in the 90th percentile on the Uniform Bar Exam
  • It aced all sections of the SAT, which among other things tests for reading comprehension and math and logic skills, and it scored far higher across the board than the average human.
  • It did acceptably well on the GRE (Graduate Record Examinations), particularly the verbal and quantitative sections.
  • It got almost a perfect score on the USA Biology Olympiad Semifinal Exam, a prestigious national science competition.
  • It easily passed the Advanced Placement (AP) examinations.
  • It passed the Wharton MBA exam on operations management, which requires the student to make operational decisions from an analysis of business case studies.
  • On the US Medical Licensing exam, which medical school graduates take prior to starting their residency, GPT-4’s performance was described as “at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations.”

The converse question that might be posed by its detractors is that, if GPT is so smart, how come it makes some really stupid mistakes, including sometimes a failure to understand a very simple concept that even a child would understand? The answer, in my view, is simply that it’s because it’s not human. We all have cognitive shortcomings and limitations, and we all sometimes misunderstand a question or problem statement, but because an AI’s cognitive model is different, its shortcomings will be different. I strenuously object to the view that because GPT failed to properly understand or properly solve some problem that seems trivially simple to us, that therefore it doesn’t really “understand” anything at all. The fact that it can generally score higher than the vast majority of humans on tests explicitly designed to evaluate knowledge and intelligence seems to me to totally demolish that line of argument, which some philosophers have been harping on ever since Hubert Dreyfus claimed that no computer would ever be able to play better than a child’s beginner level of chess.

That said, I agree that the responses of even the most advanced current LLM cannot be considered reliable. They are very often right, even on complex problems; sometimes they’re right but get some nuance wrong; and occasionally but rarely the product is grammatically correct gibberish. They should not be judged on individual trivial failures, however, but on overall performance, just like people are.

Maybe the best way to think of a modern LLM is as a very intelligent, well-read alien with access to a great deal of objective information about our world and similar reasoning skills, but very different cognitive strengths and weaknesses. Furthermore, this alien has never been taught arithmetic, but has somehow cobbled together its own imperfect understanding of how it works. And finally, this alien is a consummate bullshitter who will just make something up if it doesn’t know the right answer, and not even realize it’s doing it. But I’ve still had some very enlightening and informative conversations with it. It’s particularly good at interactive conversations that zero in on a subject of interest that you didn’t even know existed!

Oh, and for the record, I put the 1g acceleration to 500 mph question to ChatGPT 3.5. It established the right equations, converted everything to uniform units, and came up with the right answer, approximately 22.8 seconds. I half expected a mistake in arithmetic but it got the math right. I’ll post the whole thing if anyone wants to see it. Interestingly, when @Sam_Stone posed the same question to Bing using the GPT 4 framework, it came up with 22.6 seconds, the difference appearing to be due to using metrics in different units and rounding errors introduced by different unit conversions.

And now, back to your regularly scheduling Pitting. :slight_smile:

You took the words right out of my head.

This kind of AI is essentially an alien that is perfectly capable of reasoning just as a human can, but since it’s not actually human it seems impossibly brilliant in some ways and unfathomably stupid in other ways.