AI is wonderful and will make your life better! (not)

AlphaFold is a legitimate demonstration of a valuable application of technologies under the broad banner of ‘artificial intelligence’ but it should be noted that it is a purpose-specific application of “deep (machine) learning” technology with transformer architecture. It is not a large language model (LLM) or broad purpose generative AI; nobody is trying to assert that it has a ‘spark of consciousness’ or somehow on the cusp of general artificial intelligence, and it is possible to develop a validation protocol for models against an established baseline. This is exactly the kind of application that artificial neural network (ANN) should be used for because it can integrate through many different forms of possible molecular configurations until develops implicit algorithms for finding workable patterns that produce biochemically useful proteins. In that sense, it encodes useful knowledge in the specific domain in which it functions.

The problem with LLM ‘chatbots’ is that most people think of an use them as general knowledge systems when they are trained and function as a language manipulation system. They work as well as they do because there is logic and contextual information that is built into the use of language, and if there is enough text in the training set it can provide a simulicrum of a knowledge base, but it isn’t actually a verifiable base of factual information. Worse yet, because the primary function of an LLM is to provide a syntactically correct textual response to a prompt, it won’t just respond, “I don’t have this information in my trading set,” and in fact it has no comprehension of whether the correct information is in the training data or not; it synthesizes a response that may or may not be semantically correct depending on whether the appropriate answer is represented in its training data with some degree of frequency. For trivial questions it will probably give a correct answer (although apparently GPT-5 is apparently ‘hallucinating’ even with some simple things) and for more complex questions it will often give a response that is superficially correct but doesn’t demonstrate any comprehension of deeper context, but unless it hits some hard-coded limit it will never just say, “I don’t know anything about that topic.” It will literally just make something up, famously including fabricated cites to papers and caselaw that do not exist.

The reason this is a issue is technically a ‘people problem’ with users not understanding that they need to verify all factual information and carefully consider any reasoning presented in a response, but the problem with that is that the ‘killer app’ usage of these systems is to do all of that grunt work for you in the name of efficiency. So workers using these tools are under pressure to get as much productivity out of their use, which inherently means (mis)placing faith that they can be trusted to provide correct information and analysis. Enthusiasts are predisposed to see them as being ‘mostly right with an occasional hiccup’ but in fact the errors are fairly frequent and very significant if they are being used for any critical application, and the reality that there is no kind of baseline to validate these systems against means that all of the evaluation about how good they are is essentially anecdotal. And if we can’t actually have this kind of confidence in them, then they are essentially a novelty, at least for the applications to which they are put. A reliable natural language processing system could be paired with an actual knowledge base system but either it still needs to have specific constraints to prevent ‘hallucination’ for the interpretation of how to query and use the knowledge system, or else you may still be left with the fundamental problem of reliability. In any case, it isn’t a capability that justifies the tens of billions of dollars of investment just to create a refrigerator who can casually shot the shit with you while you drink a beer.

Stranger

Since you feel that quoting “sentence fragments” takes things “out of context” (though I only did it once, quoting a sentence in two parts) I’ll quote entire paragraphs this time to make sure you can’t make that claim again.

The MAGA reference didn’t “come out of nowhere”. It was a direct counterpoint to your claim that though people may perpetuate counterfactual information, they’re generally willing to admit when they’re wrong or ignorant and are held accountable for mistakes. This is, at best, a hilarious over-generalization in the present political climate, where lying is a way of life and the most senior politicians aren’t accountable for anything. Conversely, LLMs don’t intentionally lie and they readily admit mistakes. More importantly, though, the conversational model is a very effective way of seeking information. Compared to Google and its annoying AI that can’t be turned off, GPT is almost like a nurturing classroom environment.

Thank you for linking to that other thread, which I had mostly forgotten about. I absolutely stand by all the points I made there. In fact I encourage others to read it because I think it’s an informative thread. I assume your claim is that someone somewhere in that thread had an example of correcting GPT and it made a second mistake. I didn’t see that but I probably missed it, but I’d remind you that my only claim was that I, personally, haven’t seen that happen, though I’m sure it does. So what? The fact that LLMs currently aren’t reliable isn’t in dispute. What’s in dispute is the claim that they’re not useful.

This is a genuine problem, but the salient question is whose problem is it? In particular, who is it that’s supposed to be accountable? By no stretch of any reasonable logic should the engineers and research scientists producing this amazing and almost immeasurably important and useful technology be the ones to be blamed when ignorant or malicious actors abuse its capabilities.

I think it’s pretty clear that when AI is used improperly, the accountability must be either with the user, or, depending on the circumstances, with the vendor or service operator providing the AI product. If one can postulate circumstances where accountability is lacking, it’s only because the appropriate regulatory framework doesn’t exist yet.

The claim that AI is “dangerous” because naive users unaware of its limitations will act on bad advice is not unreasonable, but this, too, is a “people problem”, not intrinsically an AI problem. People get bad advice all the time. AIs have the advantage that the harm can be mitigated, at least through clear disclaimers, but maybe in some cases not providing certain kinds of information at all. This isn’t fundamentally different from people getting bad medical self-diagnoses from Google, for instance, or even in the old days, from medical encyclopedias.

An example of this type of human accountability was the commercialization of IBM’s Watson DeepQA engine for the health care industry. IBM had high hopes that it could be trained to act as a kind of medical consultant to physicians and drug developers, capable of the kind of intelligent deep search through medical archives that wasn’t practical for humans. IIRC, its usefulness was impaired by occasional mistakes, in an area where mistakes are absolutely not acceptable, and there was a general lack of confidence within the medical community. There were other reasons for the failure of IBM Health, but the point is that the accountability was there: IBM pulled the plug on the project. Though that’s not the end of the story of AI in health care; several years ago, Microsoft and a group of leading health care providers and academic institutions formed the Artificial Intelligence Industry Innovation Coalition with the goal of providing recommendations and resources for the responsible adoption of AI in the health care industry.

“Spaghetti posting” is a way I’ve seen that termed on other forums (in a “don’t do that” context); I think it describes the practice well.

One family claims AI made their life significantly worse.

The parents of a teenage boy who died by suicide are suing OpenAI, the company behind ChatGPT, alleging that the chatbot helped their son “explore suicide methods.” The lawsuit, filed on Tuesday, marks the first time parents have directly accused the company of wrongful death.

Messages included in the complaint show 16-year-old Adam Raine opening up to the chatbot about his lack of emotion following the death of his grandmother and his dog. The young man was also going through a tough time after being kicked off his high school’s basketball team and experiencing a flare-up of a medical condition in the fall that made in-person school attendance difficult and prompted a switch to an online school program, according to the New York Times. Starting in September 2024, Adam began using ChatGPT for help with his homework, per the lawsuit, but the chatbot soon became an outlet for the teen to share his mental health struggles and eventually provided him with information regarding suicide methods.

Gift link to a New York Times article about the same sad story.

I’d read that. Incredibly horrifying story.

ETA: the lawsuit itself has a lot more of the ‘conversations’ between Adam and ChatGPT, and they’re really painful reading.

Yeah, but setting up the adversarial system and tuning it would have taken up yet even more time. Examining and fixing the code myself took about 20 minutes. I already had a pretty good idea what the problem was from how the output behaved.

In reality I should have just not had any of the models try to debug it, since they really weren’t doing anything but hallucinating away. If I had just let the AI do what it was actually good at (spit out code) and not tried to get it to do something it’s actually pretty bad at (logic), I would have saved time.

I publish videos on YouTube. YouTube is continually trying to push new AI features on me - including:

Automatic translation of video titles, automatic translation of video captions, then automatic text to speech dubbing of the AI-translated captions.
Whilst there are ways in which this might seem like a good idea in principle, it’s a terrible implementation in practice because the translation of the captions is generally not all that good, and the TTS voice they used is robotic and annoying, but also because it was just delivering the wrong language to people - most of my audience expects to hear me in English, even if they are not native English speakers (it might even be the reason they tuned in - to practice their English listening), but the YT system decided that if you were in Germany, you’d obviously prefer to hear a crappy robot trying to say in German something approximately like the meaning of what I said. English speakers in non-English countries found it difficult or impossible to get the English version of the video. People using VPNs too.
Fortunately, that feature could be turned off and the damage it had done could be reversed by deleting the translated versions.

More recently, they’ve decided to add some sort of ‘AI enhancement’ to some videos - these are videos recorded on real cameras, showing real humans doing stuff, and they’ve applied some filters or something that makes them look a bit like the footage is fake.

And there’s a feature in YT Studio that’s been around a while but keeps disappearing and coming back, where I am offered a selection of prewritten AI-generated replies to viewer comments.

This is a feature that simply doesn’t need to exist - viewer interaction is a human thing - you either do it as a piece of human interaction - you do it because you care enough to do it, or you want to do it, or else if you do not feel the necessity, you simply don’t do it.

But on top of being a completely unnecessary feature, it’s a very stupid one. At best, the prewritten replies are the most lifeless, anodyne ‘I agree with the words you wrote’ sort of garbage, and at worst, they are damaging - offering me ways to slightly offend people for no reason, or ways to make myself look like an idiot. I had one recently where the commentor suggested using a spoon for something; the suggested AI- written replies included (I kid you not): ‘I would love to try that, but I don’t know how they work’. YouTube’s AI thinks I don’t know how spoons work.

Other YouTubers have seen the same sort of thing - for example Drew Gooden was offered an automated reply to the effect that he didn’t know how shoes work.

It would be easy to ignore, except for the fact that the whole of the suggested reply is one big, clickable button - the largest clickable element in the UI.

It might be fun to make a video where you tell viewers “I’m going to use that button in the replies, make a comment and see what happens.”

There’s a YouTube channel called HGModernism where they did that and as an experiment they also tried replying to every comment with a fake endorsement for ‘Tooth-Hurty Toothpaste’, proving that the comment bot could be trained (after a while, it started recommending Tooth-Hurty Toothpaste, although somewhat incoherently, because incoherence is kind of its thing)

Sure, people lie, deliberately misinterpret, and manipulate information, but then nobody is under the illusion that even authoritative human experts are neutral knowledge systems. An intelligent person will always (and generally unconsciously) consider the source of information, and when they aren’t critical about suspect claims it is generally because this feeds into their pre-conceived beliefs on how the world works. But AI tools are often treated as unbiased knowledge systems (even though, as noted above, large language models are not general knowledge systems) and their output is treated by many people as authoritative because of the confident tone and mistaken belief that the chatbot is referencing actual sources instead of internally assembling a cromulent but often error-ridden glurge of text. The notion that “LLMs don’t intentionally lie and they readily admit mistakes” implicitly assumes a volition that these systems don’t actually have; in fact, they run a prompt though an interpretation algorithm that decomposes it into tokens that are then fed into a response algorithm that iteratively and recursively assembles tokens into the most statistically consistent response. The LLM doesn’t ‘know’ anything about the truth of a response, and while it will “readily admit mistakes” when challenged that a response was erroneous, it will also admit a mistake when challenged eve when the response was actually correct because its goal is to provide a response acceptable to the user, not to provide and verify factual information.

You don’t seem to have put much effort into reading the thread because you should note that the focus of the thread posted an incorrect calculation made by an LLM, then a second incorrect and even self-contradictory response, and then even another poster presented a third response that was still incorrect in the details as elucidated in this post:

[quote=“Stranger_On_A_Train, post:31, topic:995332”]
You continue to insist that this is the case even though it has been pointed out to you that both of the results you got from Bard were wrong, and even self-contradictory as indicated above. Neither the first response you posted, nor the supposedly corrected second response provided the correct answer for the question of distance or time, while two posters who actually worked through the problem both provided the correct answer for time. The ‘reasoning’ provided by your chatbot indicated that it didn’t actually comprehend the question and was just producing grammatically and syntactically correct gibberish. (For that matter, even the GPT-4 response that @Sam_Stone provided still gave an incorrect answer despite getting the basic logic of the problem correct because it didn’t grasp the basic procedure of rounding, indicating that it still didn’t have any understanding of what it was doing and was just following a statistical algorithm of answering this question in a form similar to what it was trained on, using the logic implicit in language to perform some simulacrum of performing basic algebra.)
[/quote]

Now it is a completely valid point that LLMs are not specifically trained to perform mathematical operations or to interpret physical mechanics in a quantitative way, so expecting it to be a good physics student just because it can read text and response conversationally is perhaps not a reasonable expectation, especially by a user who was not qualified to make a critical assessment of the accuracy of the response. But when posed the question, the chatbot doesn’t respond with a caution like, “Hey, I don’t really understand physics so this might be completely wrong but here is what I came up with”; in fact, it states in the initial response, “Absolutely, I’ve been sharpening my skills in handling physics problems,” and then confidently provides a wrong answer. When challenged then offers, “My apologies, my previous response contained an error. While it’s true that it would take 500 seconds to reach 500 mph with 1g acceleration, the distance traveled during that time wouldn’t simply be 500 miles,” which, despite the “My apologies…” preface is still completely wrong but makes it seem as if it has diagnosed the error and corrected itself. Whether intentional or not, it is a deception machine that lulls a user into acceptance because it is courteous, beneficent, and seemingly trustworthy even as it repeatedly delivers a verifiably wrong answer.

You may classify this as a “people problem” but for the naive user–which I’m sorry to say includes many knowledgable and technically capable people who just don’t really know anything about the details of how a large language model works to produce responses–the fact that it provides grammatically correct, (mostly) superficially semantically coherent with an authoritative tone leads them down the path of believing that it must be dependable, which of course is what all of the enthusiasts and influencers hawking the technology are essentially saying. In fact, the practical use cases for this technology all assume that it is reliable in the information and ‘analysis’ that it provides because if you have to go double check everything it puts out there is no ‘efficiency’ or saving of time over just looking it up yourself. (Never mind that if you are asking a chatbot for caselaw citations to support your brief or references for your analysis document, you really ought to be at least reading through those to know if they actually support your argument or thesis.)

And increasingly users aren’t even given an option of whether to trust the model or not as employers encourage and even mandate the use, pressure users to rely upon it to meet workflow speed improvements, and don’t credit workers for the time they spend checking and correcting errors that they find from LLM ‘tools’. When you get “bad medical self-diagnoses from Google, for instance, or even in the old days, from medical encyclopedias,” that is on the user because they are making the obvious error of relying on a system that is definitely not an expert that is interpreting their specific signs and symptoms. But when a chatbot is providing an authoritative-sounding diagnosis with no disclaimer, and nobody is accountable for the system providing wrong and even potentially harmful information, that isn’t just a “people problem”; it is a systemic problem with a technology being promoted and used by the general public in a safety-critical role for which is the not validated and is frankly fundamentally unsuited to fill.

Stranger

Okay, but the face in the tortilla really, really, really looks like Jesus, so that proves the existence of God.

Given the evident truth of that claim it still begs the question: the face of which god?

Stranger

Well, I very recently used AI to write a browser plugin to automate a repetitive task I have. The “trust” there is a binary works/does not work. And it works.

For lawyers submitting case law…no way (been in the news…can’t have AI hallucinations there). Likewise doctors although I have read AI is better at spotting abnormalities in x-rays/scans. Still, definitely need a doctor to verify every time.

For articles? It’s all over the place. I’d still say needs a human to double-check but I’ve seen some media that seem to 100% skip that step (even admit to it although it is in the small print). And that opens a whole new line of questions…what bias is involved in the prompt given to the AI (can be subtle or overt)?

Not sure about other fields (e.g. accounting, engineering, etc).

The only thing I trust AI to do in my accounting business is (a) The aforementioned “scrape data off of PDF’s”, (b) producing a 10-word snippet based on an accounting-based picture so it can be posted to various marketing channels, and (c), “I have a technology and privacy section in this proposal I need to write. Here are links to the privacy statements of our major software platforms, give me 250-500 words of good-sounding bullshit, thanks”.

No, saying that “LLMs don’t intentionally lie and they readily admit mistakes” does not imply volition or agency, though it could easily be misinterpreted that way. What it describes here is behaviour that is intrinsically neutral and lacks any discernible bias. And your description of the token processing that achieves that behaviour is, again, like describing human intelligence in terms of neuron firing and synaptic connections – it tells us nothing about the large-scale emergent properties.

You’re right, I didn’t. You linked to one of my own posts, and I focused on that, and some of the preceding and following posts I made, all of which I still fully support. As for the mathematical question itself, as I mentioned there, I posed the same question to GPT and got the correct answer, so there’s that. And all the kerfuffle was nearly two years ago, with GPT 3.5. We’re now into the era of GPT-5 in a field that is advancing with spectacular rapidity.

Anybody saying that LLMs “must be dependable” should be promptly corrected. There is certainly a tendency for naive users to believe them to be dependable for the reasons you state, but have you actually seen any credible authorities “promoting” that idea? Open AI has made a major point of the fact that its products can and do make mistakes.

And your entire final paragraph about employers mandating the use of LLMs or other forms of AI is absolutely a “people problem”, and more precisely, a problem of incompetent management misusing a shiny new tool, the same way that in the past they’ve regarded business process methodologies like CMM to be a panacea for incompetent employees. It’s not the tool’s fault, and it’s certainly not a reason for panicked calls for a moratorium on AI development (it’s too late for that, anyway). What it may suggest is something I alluded to earlier – new regulations governing its use. These will probably eventually happen, and likely to happen sooner in other parts of the developed world than in corporatist America.

The AI I use (Claude) posts that it makes mistakes at the bottom of every response it gives (in small print, of course).

Doubtless that is a cover-their-ass thing but it is there and they are not claiming it can not be wrong.

Users should not rely on Claude as a singular source of truth and should carefully scrutinize any high-stakes advice given by Claude.

Eh…the problem here is the bias is introduced by the human asking the AI a question. The AI has no bias of its own but it will try to give you and answer you want. There is the bias from AI (really a human but the AI is enabling it).

Not to mention it is entirely possible for someone to skew an AI by deciding what datasets it trains on and can access. Imagine Nazi Germany training an AI. I doubt anyone would think it was neutral with regards to Jews (as one example).

I’d like to see some evidence for that as I’ve never seen it. GPT is “biased” in favour of politeness, but despite some claims to the contrary, I’ve never seen it actually distort facts in order to please the user. On the contrary, in several cases where I questioned its responses, after sycophantic ravings about how insightful my objections were, it went off and gathered further evidence to support the correctness of its original assertion.

The thing is GPT will answer the question it is asked.

E.G. “What are Donald Trumps successes?” “What things has Donald Trump done that cause concern?”

And so on.

Sure, the AI will do its best without bias but the question is biased so therefore the answer has to be too since the AI will not pushback at the person asking the question. Ask a loaded question, get a loaded answer.

Sorry, what is the point of using AI if its information is not reliable? It is just laziness, or unwillingness to put the time into doing something properly, or is there some benefit I’m not seeing?

I’m not convinced that racing through tasks is the best way to complete them.