AI is wonderful and will make your life better! (not)

I don’t understand how we took a turn from wolfpup’s “it can figure out things humans can’t” to “it can do things we can’t explain”, but wolfpup is correct, LLMs do things we can’t explain. It’s why Anthropic CEO Dario Amodei famously said this year:

Anthropic CEO Admits We Have No Idea How AI Works

“When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does — why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate,” the Anthropic CEO admitted.

On its face, it’s surprising to folks outside of AI world to learn that the people building these ever-advancing technologies “do not understand how our own AI creations work,” he continued — and anyone alarmed by that ignorance is “right to be concerned.”

It’s why last year, Sam Altman said “we certainly have not solved interpretability.”

Or maybe that’s not the discussion here, and I misread it.

Hey, this is the Pit – where discussions can meander at will! :slight_smile:

But you raise a very good point that is well worth noting. I only disagree with the idea that the lack of clear explanation for AI behaviour should be cause for concern. I consider it to be fascinating rather than concerning. And really, the concept isn’t unprecedented in the realm of computing and there’s nothing particularly mystical or necessarily worrisome about it.

Consider the analogy of running simulations, for example. Say you’re designing a 4-lane divided expressway. How much traffic will it be able to handle before it beomes congested, and how much will traffic slow down and at what point? What would be the effect of adding an extra lane in both directions? You could look up existing stats but aside from that, the only way to answer questions about stochastic phenomena rather than those capable of being algorithmically modeled is to run the simulation.

I think this is a reasonable analogy to not knowing what response a large-scale ANN will produce until you run it, and in the latter case, not really understanding why it produced the response that it did. What I think is truly amazing is how at very large scales LLMs appear to acquire unexpected levels of intelligence – in particular, strong problem-solving skills – that aren’t readily explainable and were not explicitly part of its training.

I say this as someone who has run a lot of simulations: that’s a terrible analogy. If I run a series of simulations and capture the random numbers used to seed each one, I can backtrace every one of them, and more importantly, reproduce them.

You can’t backtrace the decisions of an LLM or reproduce its output; you can figure out which components were activated, but other than that you can’t.

What I think is truly amazing is how at very large scales LLMs appear to acquire unexpected levels of intelligence – in particular, strong problem-solving skills – that aren’t readily explainable and were not explicitly part of its training.

LLMs do not truly “acquire intelligence” or “problem-solving skills.” Their emergent abilities are statistical patterns learned from vast swaths of text, not conscious reasoning or understanding. What looks like intelligence is entirely a byproduct of next-token prediction, not a novel skill the model consciously develops.

It’s a perfectly fine analogy. If you captured the randomization “seeds” that were invoked by a GPT response, you would necessarily get the exact same output, barring the effects of cosmic rays or mysticism. If computing was not reliably deterministic, we wouldn’t have computers. The same could be said for the human brain, except that it’s difficult to capture all the inputs.

Absolute bullshit, and I didn’t say or imply anything about “consciousness”. My claim of acquired problem-solving skills is based on the fact that, like, they solve problems – problems that intelligent adults might struggle with, and which the AI has never seen before, but somehow thinks it way through to a solution. If you deny this then you’re denying an in-your-face reality. I’m not even going to bother to cite yet again the dozens of advanced intelligence tests that previous versions of GPT have passed.

If you want to argue against Amodei and Altman, go for it, but you have no idea what you’re talking about.

Putting this in a new post because it’s fascinating, it’s AI, and it’s not LLMs:

The future of hurricane forecasting is AI : NPR

One season of forecasting does not a great model make, but it’s a really interesting start.

James Franklin, a former branch chief at the National Hurricane Center, analyzed how the forecast models performed this year, and says Google’s DeepMind outshone them all. “The model performed very, very well, which was very impressive,” he says. “It was the best guidance we saw this year.”

Artificial intelligence has been used in weather forecast models for some time. Google’s DeepMind, though, marks a significant step forward, one that suggests AI may soon overtake the physics-based models meteorologists have long relied on.

Of course, there always has to be some hype. The bluesky thread they link to shows Google’s model second place in intensity, first place in track - which is great - but the only graph they show (track), as well as their pull quote, suggest it’s a 100% winner.

But still.

I’m always willing to learn. So please tell me what part of what I said about computing being necessarily deterministic is wrong.

It does nothing or the sort. It supports the fact that “emergent” is poorly defined, and different people make claims based on different definitions.

Really, your entire response is not helpful. I asked for one example of an emergent ability that AI researchers cannot explain. I’m not asking for multiple papers where I must find the one example you think is unexplained, especially when it’s clear you didn’t read the papers yourself. I’m not asking for breathless hype from the CEO. I’m not asking for things where the exact path to an answer can’t be shown; I’m asking for new abilities that researchers cannot understand.

Whatever, it’s a hijack. We can drop it.

And I gave you half a dozen. Apparently you didn’t like them. You didn’t ask for papers that I had written myself or necessarily even read. I gave you what you asked for, yet you’re still whining.

Ah, the ultimate put-down, the intellectual argument to end all arguments that’s been used ever since the invention of the internet, and even before,

But the thing is that, at least recently, I haven’t been making any particularly technical arguments, but just observational ones. It’s an objective fact that ChatGPT (even in its older version – GPT 5 is far superior to GPT 4) has cleared hurdles that would challenge most humans. To refresh your memory, for instance:

Unlike you, I read enough of them to know they did not answer my question.

I really wouldn’t trust the statements of people who stand to make a massive profit off of investors’ belief that AI is a super special technology that will completely disrupt the economy. I try to never, ever forget these guys are trying to sell something.

I’ve been down this road before. His cites on this subject are invariably garbage, in many cases contradicting his own claims. I assume he’s posting whatever Chat GPT spits out. And I pretty much skip over any post produced by an LLM (except the poetry; it amuses me.)

Wolfpup appears to be an intelligent person which is why I believe humanity is doomed. If a Doper can be so hoodwinked by a text-generating machine, I think the general public is pretty much screwed. When you have a billion people working on a project at once, it’s inevitable they will find things are not working as expected. For evidence of that look at the premature release of Diablo IV, sigh. (Unfortunately not much intelligence went into that case.) If it’s true LLMs are doing stuff we can’t currently explain, I’m quite certain they will figure out with time why they’re getting unexpected results.

In the narrow cases in which LLMs perform well on exams, it’s because they were trained exclusively on that data. It’s not “thinking,” it’s regurgitating the information it was fed. Passing the bar when you’ve been given all the answers to the bar is not that impressive to me. The claim that larger language models just get exponentially better when they get scaled up appears to be contradicted by recent attempts to scale up. If you look at the beginning growth curve of a human, you’d expect people to end up 30 feet tall by the time they died. But things don’t always work that way. A relationship that appears to be exponential may not be when the rest of the data comes in.

I think LLMs have limited use cases; I recently got one to draft a SOP for federal grant roles that I was avoiding writing because I couldn’t figure out how to structure it. Of course it required my actual knowledge to go through it and correct anything that didn’t work. So for language processing tasks like text documents and say, searching the Uniform Guidance 2CFR Part 200 for procurement standards, I can see that potential. If the makers of LLMs were selling them based on their capacity to reduce administrative overhead, I wouldn’t have such a problem with them.

I am not an expert on LLMs, which is why I find this so frustrating. Just a little bit of investigation listening to actual experts talk about this has revealed that the claims about this technology are greatly exaggerated. Cal Newport calls it “vibes-based reporting.” It exists to sell things.

I’m fairly confident the researchers wolfpup talked to one time would be much less confident in what’s going on than he appears to be, because that’s how most scientists are. They tend to be careful not to exaggerate or assume facts not in evidence. I think right now we really need these scientists.

I see these type of comments periodically on the Dope and I don’t understand them. AFAIK, any link to a Bluesky or X post is just like any other link—You click on it and you view the post. Easy-peasy. Where did the idea come from that you have to be a registered member to read Bluesky and X posts?

But things change fast on the web. I guess I could be wrong because it’s changed or something.

I see these comments a lot and they always mystify me.

Bluesky users have the option to restrict post visibility to registered accounts. If you don’t have a Bluesky profile, or if you’re not signed into it, then you’ll get a message about “this user has limited access” when you try to view such a post. It’s not a global setting and many Bluesky people have all their posts visible by default, which is why you haven’t seen it.

On Xitter, the situation is a little different. There are content gates that limit visibility where the poster specifies the Xit contains, say, adult material (or where the system flags it by other means). In that sort of case, you must specify in your Xitter account that you affirmatively want to see that kind of stuff in order to pass the gate. If you don’t have a profile, you can’t make such a specification, and you can’t pass the gate.

Also, on Xitter, even if the original post is visible to all without filters, the replies are always hidden unless you’re signed into an account. Part of the value of Xitter was the conversation that would follow an initial comment (or at least it used to be until the platform was overrun by trolls, Nazis, and bots), so engaging in the community, even just as a reader, requires an account.

Edit to add: here, for illustration, is a Bluesky post requiring sign-in.

A while back someone complained that they couldn’t read my link to a bluesky post, so now I just include itas a default comment.. Sorry if that was confusing to you.

I’ve mentioned several time that I use ChatGPT to help with lesson preparation. Since I teach a lot of beginning classes, it’s nice to have it churn out exercise problems.

But, it’s absolutely stupid at times.

Today, I was teaching a set of 14 vocabulary words. I had it create basic sentences for examples. And, translate the sentences into Japanese. The translation was passible.

Then it all went to hell. I wanted a second set of sentences. It refused to believe there were 14 vocabulary words. It insisted there were only five.

After we got that straightened out, I I asked it to create different examples, but it basically gave me the same ones again.

When I ask it to make multiple choice quizzes, sometimes it makes all the choices the same. If I give it a list of words for fill in the blanks, it puts them in the same order as the list of words.

I use my account on my cellphone, my laptop and the computer at school. The OS on the computer at school is in Japanese so it wants to address me in Japanese.

I teach both Japanese and Taiwanese students and it mixes up which languages to use with the students.

It can do some things really well but other just absolutely wrong.

It’s funny that Sam Altman, who lies so much, says something true that isn’t believed. But he’s earned that distrust. It’s a little more unfair to Amodei, though - he and a bunch of people left OpenAI because they felt that Altman wasn’t committed to safety and transparency, and they set up Anthropic as a public benefit corporation…but still, tech CEO’s gonna lie regardless.

Anyway. Mechanistic interpretability is a whole ongoing effort to explain the outputs of AI, especially LLMs. Neural nets have always been a black box, but the amount of calculations going into these particular models makes the box even blacker. I like Alberta Tech’s description of this…

…well heck, after a long while of being able to link YouTube videos, I can’t. It’s her video “Nobody Knows Why LLMs Work.”

Things like this happen to me when I’m working on code, and sometimes I just have to tell it “let’s start a new topic: ignore everything previous in this conversation”, and then re-enter the prompt. Not great, just a workaround.

Amodei’s quote came up amid the discussion on emergent abilities and I think it might have created some confusion. It is not an argument for emergent abilities – just an honest admission that we don’t have any idea how it works. I think this is what you meant as well, @Maserschmidt?

Anthropic has some excellent material breaking down LLMs and trying to understand how they manage to make any sense at all. That said, if I have learned anything from The Big Kahuna, it’s that everyone is selling something. In this case Amodei has to sell investors on the value of understanding LLMs.

I mostly agree with this. The hype is out-of-control – fed by companies selling something and companies with FOMA. They all saw what happened to companies that missed out on the internet or mobile phones.

The backlash is also overblown, but that’s understandable. The current crop of LLMs are capable of a limited set of use-cases however these use-cases are extremely common. It’s not just coding and IT, summarization, and translation; there are a lot of business processes that consist of manually shuffling documents in and out of databases.

We’re starting to see the tool development (apps that make use of LLMs) catch up. They can add context, data-sources, and UI to make LLMs useful and accurate for a large number of tasks. In this way, LLM progress stalling out is a good thing because it forces companies to figure out how to actually use LLMs well.

This is what is still missing for a task like grant writing. A tool that handles the boilerplate, helps organize thoughts, checks for errors and inconsistencies, but then gets out of the way for the core content would be ideal.

Yes, that’s what I meant. I try to avoid the entire topic of ‘emergent capabilities’…

That’s wise. I was asking wolfpup specifically about his “emergent abilities” statement because it’s such a loaded topic. He likes to use it as an example of how wonderful AI is, but he’s clearly just repeating some marketing hype and either isn’t interested or doesn’t understand the details.

It’s frustrating because we were just discussing the importance of defining “emergent” less than a month ago. And as @DMC said at the time:

Any complex system that’s developed via repetition of feedback loops is likely to develop emergent behaviors, i.e. things that it wasn’t taught to do.

In a biological system, most of these will be useful and important because natural selection selected for it. In technological systems, a lot of junk will survive.

In manmade systems, what we’re selecting for is the ability to impress the developers of the LLM. So it would be remarkable if there were not emergent behaviors in this category. However, LLM developers are notoriously hypemasters who make products to impress other hypemasters.

Given that this is the selective pressure in effect, and given that it has little to do with correctness or accuracy, these machines should exhibit behaviors that impress people who are easily impressed by tech marvels and aren’t particularly skeptical about them. And that’s exactly what we see.