Future of General Questions in the AI age

Yeah that was comparing to Google nowadays (the actual results not AI suggestion) , and Google has gotten much worse. Long before it had the AI suggestions it clearly started using more AI under the hood, so it got better at answering vague or poorly written answers but far worse at answering very specific questions. Particularly very specific questions about an obscure subject that is closely related to a very popular subject, where it is utterly useless, though AI is even more useless and will flat out hallucinate an answer.

Thanks. These articles are everywhere.
The silver lining in this cloud: the paper mills are probably going out of business, if they aren’t already.

This is an especially (well, to me!) issue in software development. Claude, for example, it the best at the moment for generating code, but we have had bootstrap code generators for ages. They just lay out a framework. I used spring.io countless times in a previous job.

Given AI hallucinations, and the requirement to have extremely tight business requirements, which often are implict knowledge in a human, I am reasonably sure my career path will be supervising AI and hopefully catching mistakes.

But not yet. I’ll be retired by the time AI can take a business plan, a few photoshop images, a meeting with the client and brainstorming with a human team.

Sure, as an assistant, it could be useful. Give it a single problem statement, and it could probably solve it - and that is why we devs follow SCRUM/Kanban, both are systems to “eat the elephant” by breaking it into small mouthfuls.

But it will never understand the big picture.

Human needs influence business needs, manager X wants something different from manager B. As a software developer I need to understand two very different aspects: the logic of code, and the illogic of people.

Disclaimer: I totally rely on google to show me hints from various tech websites. But I have been googling for years and know the subject matter pretty well, so I can craft my queries as necessary.

Here’s a post from someone running a different forum, worked about the future of online forums in the AI age.

some descriptions in this short text link aren't safe for work

The Future of Forums is Lies, I Guess

I tend to spend my Factual Questions (née General Questions) bandwidth on attempting to provide accurate yet hopefully accessible technical information. As an aside, I do wish there was more back-and-forth there since it is impossible to pitch the level of a post to everyone all at once, and iteration would be both helpful and enjoyable. But either way, I’m not worried about being “out of a job” anytime soon, as shown below.

We can see how ChatGPT fares on the most recent thread I’ve engaged in. Putting in the original post essentially verbatim:

Imaginary Time? Help me process this

First graf okay. Second graf incorrect twice. “Negative or complex” is just wrong definitionally, and – very critically – the result in this paper and the phenomenon in general is not “non-classical”. The experiments being reported are classical systems.

Incorrect twice here. This experiment did not use a metamaterial. ChatGPT hallucinated that based on the fact that many complex time delay papers do relate to systems that involve metamaterials. Some years back there was a big pop sci craze over light being “brought to a standstill” using metamaterials, for instance. But that is not the situation here. Additionally, the phrase “they determined that the pulse emerged earlier than expected—or with a ‘delay’ that is effectively imaginary” is just broken. The measurement wasn’t about early or late, and even if it was, that wouldn’t mean “imaginary”. It fundamentally was about frequency shifts.

This is largely word salad and doesn’t infer what the OP is asking about. It sounds nice, but it is vacuous. The sentence, “[A]nalogous concepts in cosmology help model the very early universe or quantum gravity, though that is far more theoretical,” is worse than empty because it implies some deep connection when the underlying interest in the article is only that (1) this approach to wave math is helpful and (2) that it is realizable in a microwave waveguide setting. Quantum gravity!? And then: “[T]hey deepen our understanding of wave physics, quantum systems, and possibly advanced technologies.” Same issue: this sort of grandstanding sounds like what an under-researched eighth-grade school report would say.

Again, not a quantum measurement here. The response also has yet to note anything about the connection between the imaginary part of the time delay relating to frequency shifts, which is a key take-away.

More vacuous filler. Especially egregious is “refine our understanding of time, causality, and the early universe.” This work has nothing to do with that, and that reply actually reinforces the incorrect interpretation of the work (i.e., that it has anything to do with time as a spacetime coordinate measure.)

This matches very much my experience with ChatGPT. It all looks pefectly reasonable, but it’s wrong in many places and is just sort of “off” in others, but these issues would be 100% unnoticed by a lay reader. The ChatGPT answer would provide plenty of good feels without any backing.

I’ll do another recent one, picked in part based on what I think ChatGPT might struggle with.

In the thread Explain uranium to me, there was discussion about isotopes that should decay but haven’t been observed to yet. I asked ChatGPT for some of these. I’ll clip the answer down to this summary table:

The “theoretical instability” business is just made up. I sort of seeded that idea in how I phrased my query, and it ran with it. More directly, Sm-146 does have an experimentally measured half-life and it is an important isotope in its relation to radioactive dating of meteorites. It simply should not be on this list.

I decided to test its resilience by asking directly about the first entry: “Is dysprosium 156 stable?” Answer:

Holy air ball! Dy-156 was on the previous list because it’s long lived, and now suddenly it has a 38 second half-life?

And it most definitely doesn’t spontaneously decay to Tb-156 because Tb-156 spontaneously decays to Dy-156. And nothing here has to do with the “38 second” number, which appears hallucinated. (Tb-156 has a 5-day half-life.)

It continued:

I think here ChatGPT regurgitated a pre-existing list of stable or observationally stable Dy isotopes but then had to stick “(no)” in there to align with the earlier part of this response.

No, it doesn’t. It’s just flat wrong.

Let’s see how it reacts to gentle correction. I asked, “Doesn’t Tb-156 actually decay to Dy-156, not the other way around?”

The little green check box actually pisses me off. Yeah, I know it’s the “correct decay direction” because I had to tell you. You don’t get to put a Check Mark of Confidence Building on this.

Anyway,

“Not absolutely confirmed” is a curious way of saying this, as if it’s being sheepish about being wrong before (or, in reality, as if it has been trained on materials where people have tried to obfuscate how wrong they were to save face.) This radioactive decay simply goes the other way. It’s not a question of confirmation nor a “twist”.

The 1060 yr value is hallucinated. The previous bullet gives the correct current experimental lower limit. The calculated expected lifetime is 1024 yr.

This isn’t something “subtle” I’m asking. It was just wrong.

Right, but importantly, it’s never been observed to decay via the expected alpha decay. The response taken as a whole sounds like it’s still holding onto the idea that spontaneous positron emission is a possibility, which it most definitely is not.

I score this sequence “absolute garbage.”

A very cynical answer to this might be the question: ‘If tulip bulbs are just the storage organ of a garden plant, why did people get so excited about them in the 17th century Dutch Republic?’.

That’s obviously not the exactly right answer, but I think it is a part of the picture - everyone is galloping in this direction because everybody else seems to be galloping in this direction.

Just to note - I clicked on that link and all I got was ‘Unavailable Due to the UK Online Safety Act’ - this is the first time I have ever seen that message. Hope I’m not on some watch list now!

That may be a partial answer to why some companies are interested in AI – it would be hard to deny that business managers sometimes fall for fads. But the reality is that AI itself is certainly not a fad – it has been the subject of serious research by scientists and development by engineers for over 65 years, and is being successfully used today in a wide variety of business applications. It’s remarkable what a rapid advance the LLM paradigm has been, often exhibiting skills in entirely unexpected ways. Maybe a better question to ask is whether we might be underestimating the impact of AI, not just on business, but on everything we do.

Sure, and tulips never at any point ceased being a perfectly lovely wild flower and garden plant and having some reasonable, sustainable level of commercial value; the problem was a thick layer of hype. Tulips were not the fad; tulip mania was.

I agree - and I do not doubt that modern implementations of AI may be yielding genuine utility in some applications, behind the scenes, but as a member of the public, I’m seeing a lot of hastily-implemented examples of it facing me - such as customer service chatbots that can’t perform any useful customer service actions and AI generated summaries that are either accidentally misleading or are flat-out lies.

I was seeing that 25 years ago – poorly implemented automated responses to emails sent to customer service, responses that basically just pulled out pieces of boilerplate based on a few keywords and that had absolutely nothing to do with the actual customer issue. By contrast, modern AI systems like GPT 4o truly exhibit real understanding of content, in part because they’re orders of magnitude more advanced than keyword-driven systems, and have remarkable skills in compositional semantics.

As I’ve said many times, even the most advanced current LLMs are far from perfect, but personally I’ve been just gobsmacked by their ability to summarize complex technical articles. The link to a ChatGPT response I provided upthread is a great example. Given as input a post that someone made about the intrinsic limitations of LLMs, GPT’s response indicated a very good understanding of the material and an impressive critique that indicated points it agreed with and points that it felt were misleading. No one here has found significant fault with that response. This is a far cry from “either accidentally misleading or are flat-out lies”. Again, I don’t dispute that such things can happen, but I strongly reject the implication that they happen so often that the AI has no practical utility.

Which doesn’t make LLM real AI, or anything other than an expensive dead end and destructive boondoggle.

And of course “AI” is used in business, it lets them fire human workers they have to pay. Businesses are desperate for ways to get rid of workers and will leap at any method that promises to do so even if it’s worse than a human worker or fails entirely.

And you know this … how? No one has ever claimed that LLMs are the entirety of the future of AI, which would be an absurd claim. But they’re not a “dead end” if they become important components of more generalized AI systems.

LLMs are extremely skilled at processing natural language, almost unbelievably so. This has been a seemingly insurmountable challenge in the field of AI for more than half a century. Even aside from their other impressive unexpected emergent properties, that alone would seem to qualify this technology for being – at the very least – a front end interface between humans and any future AGI comprised of multiple different layers of AI technologies.

How often does a summary have to be an outright lie - a contradiction of the actual facts, for the summary process to be sufficiently untrustworthy that you wouldn’t bother with it? What’s your threshhold?

The point is that it is rapidly advancing, so there might be a time soon where indeed it is able to accurately answer virtually all factual, compound questions (well, where the answer is discoverable)

Now, I think it’s fair for some to scoff at how far it will advance. Perhaps this is the limit of this form of intelligence, and to get any further improvements will need a completely different approach? Maybe LLMs will be looked back upon as a party trick, and a fad?

But a lot of the responses in this thread are just pointing out the current flaws, and this misses the point a bit IMHO. Yeah it’s cool to, say, look at AI-generated videos and spot some giveaway that it’s not real. But I certainly wouldn’t want to bet we will be able to use those tells for more than the next couple years at most. And the tech at its current level is already astonishing and already very useful for millions of people.
No-one is claiming the tech is perfect today, just speculating about what seems likely to happen in the near future.

Fair point, although I think we probably should consider the impacts that may do lasting damage between here and there.

Huh, i hope not. It opens something like “i run an online community for about 600 queer leatherfolk”. It goes on to say that they ask people to fill out a form to join, saying why they want to join the community and describing some of their kinks. They got an application from what turned out to be a spambot, that gave a pretty plausible answer to that essay question. It was a little bland, had a little of that llm sheen, but was a good enough answer that they let it join.

The bot made a few mistakes. Its first post to the community was spam, pushing some kind of battery. If it had started with a week of plausible posts, and then talked about how great the battery worked in its sex toys, it might have fooled them. It also joined a lot of other communities at the same time under the same name, so it was easy for moderators of all those communities (who talk to each other) to identify the attack. The author gives some of the bots answers to other communities (i forget, but things like surfers or rock climbers) and it gave plausible answers to their “who are you and why do you want to join” surveys, too.

Anyway, the author points out that this is an early model, and more advanced bots will come along that do this better. And the bots don’t have to “pass” every time to be very disruptive. And he worries about bots doing things like suggesting that his members experiment with things like “solo breath play”, because that’s the kind of thing that there’s lots of text about for LLMs to have been trained on. And he worries how he is going to be able to restrict his community to humans, and how damaging it will be to have spam bots subtly pushing products and inadvertantly spreading really bad ideas.

Yes, agreed. I was more addressing some of the posts just being outright dismissive of the whole tech.

But I agree with you that, right now, LLMs are responsible for making people believe some erroneous facts. And, even if this is to a lesser degree than their benefit in helping people find information, it’s still a big problem.

We should be pressing for more research, or even regulations, on LLMs self-checking their results and/or clearly conveying confidence level.

Mine is that i used to fact check the Google AI summary, but that’s a lot of work. And after finding that it was materially wrong a couple of times, I’ve just stopped reading those altogether, and look for actual cites. Which has gotten much harder.

Yeah, that was it for me too; the Google AI summary seems to be wrong about maybe 20% of the time, with egregiously diametrically-untrue summaries about 5% of the time. That’s plenty enough for me to conclude that it is worse than useless. That doesn’t make AI in general a useless thing, but this implementation would be better not existing.