AI is wonderful and will make your life better! (not)

Sure, people lie, deliberately misinterpret, and manipulate information, but then nobody is under the illusion that even authoritative human experts are neutral knowledge systems. An intelligent person will always (and generally unconsciously) consider the source of information, and when they aren’t critical about suspect claims it is generally because this feeds into their pre-conceived beliefs on how the world works. But AI tools are often treated as unbiased knowledge systems (even though, as noted above, large language models are not general knowledge systems) and their output is treated by many people as authoritative because of the confident tone and mistaken belief that the chatbot is referencing actual sources instead of internally assembling a cromulent but often error-ridden glurge of text. The notion that “LLMs don’t intentionally lie and they readily admit mistakes” implicitly assumes a volition that these systems don’t actually have; in fact, they run a prompt though an interpretation algorithm that decomposes it into tokens that are then fed into a response algorithm that iteratively and recursively assembles tokens into the most statistically consistent response. The LLM doesn’t ‘know’ anything about the truth of a response, and while it will “readily admit mistakes” when challenged that a response was erroneous, it will also admit a mistake when challenged eve when the response was actually correct because its goal is to provide a response acceptable to the user, not to provide and verify factual information.

You don’t seem to have put much effort into reading the thread because you should note that the focus of the thread posted an incorrect calculation made by an LLM, then a second incorrect and even self-contradictory response, and then even another poster presented a third response that was still incorrect in the details as elucidated in this post:

[quote=“Stranger_On_A_Train, post:31, topic:995332”]
You continue to insist that this is the case even though it has been pointed out to you that both of the results you got from Bard were wrong, and even self-contradictory as indicated above. Neither the first response you posted, nor the supposedly corrected second response provided the correct answer for the question of distance or time, while two posters who actually worked through the problem both provided the correct answer for time. The ‘reasoning’ provided by your chatbot indicated that it didn’t actually comprehend the question and was just producing grammatically and syntactically correct gibberish. (For that matter, even the GPT-4 response that @Sam_Stone provided still gave an incorrect answer despite getting the basic logic of the problem correct because it didn’t grasp the basic procedure of rounding, indicating that it still didn’t have any understanding of what it was doing and was just following a statistical algorithm of answering this question in a form similar to what it was trained on, using the logic implicit in language to perform some simulacrum of performing basic algebra.)
[/quote]

Now it is a completely valid point that LLMs are not specifically trained to perform mathematical operations or to interpret physical mechanics in a quantitative way, so expecting it to be a good physics student just because it can read text and response conversationally is perhaps not a reasonable expectation, especially by a user who was not qualified to make a critical assessment of the accuracy of the response. But when posed the question, the chatbot doesn’t respond with a caution like, “Hey, I don’t really understand physics so this might be completely wrong but here is what I came up with”; in fact, it states in the initial response, “Absolutely, I’ve been sharpening my skills in handling physics problems,” and then confidently provides a wrong answer. When challenged then offers, “My apologies, my previous response contained an error. While it’s true that it would take 500 seconds to reach 500 mph with 1g acceleration, the distance traveled during that time wouldn’t simply be 500 miles,” which, despite the “My apologies…” preface is still completely wrong but makes it seem as if it has diagnosed the error and corrected itself. Whether intentional or not, it is a deception machine that lulls a user into acceptance because it is courteous, beneficent, and seemingly trustworthy even as it repeatedly delivers a verifiably wrong answer.

You may classify this as a “people problem” but for the naive user–which I’m sorry to say includes many knowledgable and technically capable people who just don’t really know anything about the details of how a large language model works to produce responses–the fact that it provides grammatically correct, (mostly) superficially semantically coherent with an authoritative tone leads them down the path of believing that it must be dependable, which of course is what all of the enthusiasts and influencers hawking the technology are essentially saying. In fact, the practical use cases for this technology all assume that it is reliable in the information and ‘analysis’ that it provides because if you have to go double check everything it puts out there is no ‘efficiency’ or saving of time over just looking it up yourself. (Never mind that if you are asking a chatbot for caselaw citations to support your brief or references for your analysis document, you really ought to be at least reading through those to know if they actually support your argument or thesis.)

And increasingly users aren’t even given an option of whether to trust the model or not as employers encourage and even mandate the use, pressure users to rely upon it to meet workflow speed improvements, and don’t credit workers for the time they spend checking and correcting errors that they find from LLM ‘tools’. When you get “bad medical self-diagnoses from Google, for instance, or even in the old days, from medical encyclopedias,” that is on the user because they are making the obvious error of relying on a system that is definitely not an expert that is interpreting their specific signs and symptoms. But when a chatbot is providing an authoritative-sounding diagnosis with no disclaimer, and nobody is accountable for the system providing wrong and even potentially harmful information, that isn’t just a “people problem”; it is a systemic problem with a technology being promoted and used by the general public in a safety-critical role for which is the not validated and is frankly fundamentally unsuited to fill.

Stranger