Future of General Questions in the AI age

And, despite the descent of discussion into a generic AI wonders pitfalls limitation so on thread, the OP asked about a very specific use application: generating FQ responses.

It is not of current value for that application.

Of course some fraction of member responses aren’t either. But at least they aren’t so wordy about it. And we as users in a reasonably sized community begin to recognize whose responses to FQ can be believed and whose to doubt. Many come sourced with authoritative citations.

Businesses can sometimes prefer AI producing output of poorer quality than an average paid worker if it is much cheaper per widget. (Or if it increases the productivity of an average worker.) We however cannot get FQ production costs down any less than the current zero cost.

And for business use if it is better than the average worker it is great, even if it makes mistakes. For many use cases good is more than good enough. AI generated posts is not such a use case.

FQ also serves other utility to many of us that asking an AI does not: a real user may understand the question and can sometimes expand the answer beyond the narrow factual answer scope, share a broader understanding of the subject or closely related items that we enjoy learning about, a few posters may end up sharing some informed on point discussion about the subject that the rest of us benefit from.

Of course there is also the aspect that part of participation here, including FQ, is a social interaction, even in if it be in virtual space:

I can go on line and play Go against the computer and have it dialed to be at my level of play, where I have a possibility of winning … it is not as enjoyable as playing on line against my son who plays Go.

I could listen to a recording playing music in very high quality; I still enjoy a live performance with flaws more.

Also, other than naming a couple of kinks and using slightly suggestive words to describe people who engage in specific kinks, there wasn’t anything particularly NSFW in that link. I assume it was flagged because of its domain or something.

Also when a member responds with an incorrect answer to a factual question, the objective of the board is still met in that it serves to combat that person’s ignorance.

Indeed - it would of course be possible to replace all of this with automated bots asking synthetic questions, being responded to by more bots congratulating the first bot, or nitpicking it, etc, but what would be the point?

But how about playing the hypothetical?

What would be the place of FQ if/when AI advances to a point that the vast majority of FQ OPs would get an on point factually correct response from widely accessible AIs?

It is not currently there, but when/if it is?

Yeah the forum probably withers away.

And in that advancement hypothetical please keep AI out of IMHO GD etc. …!

I do not want to read the opinions of and debate with an AI in any probable near term future.

FQ right now is wonderful because this board has a broad range of people with deep understanding of individual subjects (even if we as individuals are complete morons about other ones). Ask an esoteric even poorly worded question and someone here is likely to understand the question, know the answer, and be able to explain it in a way that most of us can understand. Right now AI cannot reliably do that, but when it can then yeah, just AI it might be what happens. But opinions and debate? NO. NO. NO! :smiley:

Yes, well summarized, and I agree.

There might be a time where FQ shrinks considerably as fewer questions reach beyond the level where an AI can give a comprehensive response. (even though I agree there is still a social function to asking humans those questions)

What might happen is that, where people want a question answered but also hear some anecdotes, asides or spin-off questions, they ask it in GD or IMHO. Which, in that hypothetical, would probably have specific rules on the use of AI.
Anyway, this is a tower of speculation, so I’ll leave it there.

@wolfpup, I took a look at the ChatGPT self-reflection exercise you linked to. To add my two cents there, that ChatGPT response is a good example of its ability to provide cogent-seeming prose, but that’s precisely what it’s best at, and the leap from “that’s cogent-seeming” to “that’s actually meaningful” is a leap that I fear users take way too freely.

That particular exchange is about a topic that is inarguably debatable, so it’s a relatively low bar IMO for a generated response to sound reasonable. The “Debate Team” judges will happily clap at the articulate stance presented. To my eye, a critical examination of such a tool requires cases where correctness can be concretely checked.

So, I would be interested in your take on the two examples I posted above, if they aren’t TL;DR.

(As an addendum to those posts, but not to distract from them, I was tickled by “…is considered observationally unstable — it has not been observed to be stable”. In the original post I focus on the factual inaccuracies, but this line shines light on the lack of understanding of the words. The grammatical construction in reasonable (and is used) in the inverse case – i.e., “observationally stable” and “not observed to decay [be unstable]” – but the other way around is nonsensical. Something that is “observationally unstable” is just “unstable”, and the phrase “has not been observed to be stable” betrays a lot. “Have we seen it being stable yet? No? Let’s keep watching to see if we can catch it being stable.”)

I see real practical value in the tools, and I make use of them in suitable ways. But for however impressive juggling 7 or 8 (or – wow! – now 9) balls at once is, it still can’t know if it’s juggling balls made of plastic or made of dung. And users often don’t know either, what with all the distracting juggling.

I see your point but I’d say that’s only partly and not entirely true. What’s true is that when GPT applies comments like “misleading” when critiquing some of the arguments, it’s making a value judgment and one could conceivably argue otherwise. But that isn’t the point. The point is that it supports its “misleading” label with a cogent and factually correct description of how LLMs really work. It exhibited both an impressive understanding of the original argument and a deep understanding of LLM internals.

In the first example, on imaginary time, my only superficial knowledge of the subject comes from a book of the transcripts of Stephen Hawking’s Cambridge lectures. I don’t have a clue what the hell the University of Maryland research with microwave pulses has to do with this. The post that you fed GPT as a prompt was from a Popular Mechanics article with the breathless headline that suggests that scientists have discovered “a Brain-Bending Version of Time That Shouldn’t Exist”.

This suggests several things to me. One is that throwing the cosmological concept of imaginary time into the same prompt as the Popular Mechanics article is confusing, and perhaps illustrates the known fact that GPT responses are extremely sensitive to the exact formulation of the prompts. I would also note that much of what it got wrong in that response was about the microwave experiments, not the cosmological concept, and it may have got it wrong due to factual errors in source materials, maybe the Popular Mechanics article itself or similar publications. One thing that you and I both know is that when popular media report on scientific discoveries, they can be relied on to get about 80% of it wrong.

As an exercise to test my theory, I asked ChatGPT a simple and straightforward question. I just asked it to explain Stephen Hawking’s model of imaginary time as it relates to cosmology and why the concept is useful.

Rather than post the entire response, here are the highlights that I think are informative and accurate.

  • It immediately identified this concept as the Hartle-Hawking “no-boundary” proposal

  • It summarized the concept as follows: If real time is a line from past to future (with a beginning and possibly an end), imaginary time behaves like a spatial dimension—it doesn’t have a preferred direction and can loop around smoothly. Hawking used this idea to reframe the beginning of the universe. By rotating time into the imaginary axis (a trick called Wick rotation), spacetime becomes Euclidean rather than Lorentzian, meaning all four dimensions act like space.

  • As to why the concept is useful, it correctly stated one major reason: it eliminates singularities. Hawking and Hartle proposed that if we look at the universe in imaginary time, there is no singularity. Instead, the universe is finite but unbounded, like the surface of a sphere.

  • A second reason that I think is correct is that it bridges quantum mechanics and cosmology: imaginary time makes quantum models of the universe more mathematically tractable. It allows for a quantum cosmological wavefunction of the universe.

Your second example, about isotope decay, is one about which I have so little knowledge that there’s little I can say about it. I do note that you said that you intentionally picked a subject that you “think ChatGPT might struggle with”. I’m willing to take your word that apparently it did struggle, and made mistakes. The question remains: how often does it do this, versus the number of times it provides genuinely useful information, and genuinely useful summaries and analyses of content.

I’d say it’s just awkward wording, which is unusual since articulateness is one of GPT’s strengths. It’s the kind of slip that native English speakers make all the time without being accused of “lack of understanding of the words”. It’s just saying that the element has been empirically observed to be unstable. One might, if appropriate, feel obliged to add that there are no known conditions under which it becomes stable.

I’ve no skin in these discussions so long as our community continues to recognize that “AI” is not ready to have its responses regurgitated as FQ responses and it is not used as responses in other fora at all.

But

and

are supposed to be defenses of how wonderful it currently is?

It seems to me these things are more highlighting its significant limitations.

Its difficultly in responding to the intended meaning of a prompt, misdirected by the formulation of it in a wrong direction? Human to human misunderstanding of intended meaning is bad enough!

And the second is the biggest. You know, I know, to take a pop sci source as a starting point at most, and to then dig deeper from there into more authoritative sources to fact check it before believing it. It sounds like current AI doesn’t? That is surprising if true, if it uses all information as equally valid. And if true is much worse than even the lowest quartile quality of posters here!

On the second point, you’re making a fundamental category error here. There’s a fundamental categorical distinction between the intrinsic capabilities of an LLM and the corpus it’s been trained on that has no relevance whatsoever to the capabilities of an LLM that’s been trained on well-curated materials. ChatGPT at this point is a technology demonstration and proof of concept, not a certified teacher or advisor.

On the first point, sensitivity to the formulation of prompts, do you think that humans are immune to that? How about survey questions that are slanted to elicit a particular kind of response? How about the questions asked in police interrogations?

Why on earth should this be surprising? How could you possibly believe that an AI has some intrinsic capability to distinguish between the quality of different sources unless that was part of its training – which in a technology demo I’m sure was not.

LLMs should be thought of as a human-machine language interface that, due to the massive scale of its ANN, has also developed some impressive emergent capabilities. But just like a human, its knowledge is based on, and limited by, what it’s been trained on. And, distinctly unlike a human, it will likely become just one layer in a future AI with many other components.

I wasn’t necessarily expecting that you did or didn’t; I attempted to do all the required analysis work already so that folks could (assuming for argument’s sake you trust my analysis) comment strictly on the mistakes and mischaracterizations laid bare.

And they are egregious. Switching to your different example reported only through your human filtering is much less telling to me, and it is also beside the point of the original example(s).

[quote=“wolfpup, post:87, topic:1020443”]
I would also note that much of what it got wrong in that response was about the microwave experiments, not the cosmological concept, and it may have got it wrong due to factual errors in source materials, maybe the Popular Mechanics article itself or similar publications.[/quote]
But… isn’t that one of the purported great strengths? “Hey, here’s an article that talks about imaginary time. What’s that all about?” And then it gets many things wrong, frames the point of the article incorrectly, and throws in grandiose conclusions with no basis. I don’t follow how this can be so easily dismissed. And most of the issues do not directly appear in the source material provided, and certainly not in the underlying journal article (which is linked from the source materials). Where is the goal post if “Please help me understand this article?” is not within it?

As above, if you’re willing to (for argument’s sake) trust my analysis, your personal experience on the topic shouldn’t be a limitation in addressing ChatGPT’s performance on the task. The main points in this second example are that (1) I asked a very simple question about long-live isotopes, and it put something on the list that simply is incorrect, and (2) I asked an very simple and direct question about one of the entries on the list, and it gave an egregiously incorrect response, and (3) when corrected, it tried to mask the flat incorrectness with smoke and mirrors.

And to be sure, I had to do significant research on some of the mistakes in the second example to adjudicate them confidently – well beyond any regular user’s ability.

In my experience, it has mistakes as often as not, or at least that’s the general scale. It’s certainly very far from “rare”. The corresponding question remains: what rate of mistakes are you willing to tolerate in a production setting?

In my view, the rate is astronomically higher than what is needed for how the tool is getting used in practice.

I picked my most recent FQ post as the first example as some sort of “unbiased” FQ example. I picked my almost-next most recent post as the second example, skipping only a couple of cases where the answer was so imminently Google-able or textbook lookup-able that it would have been uninteresting as an example. But that again seems like where folks want to put the goal posts – the slightly harder-to-get-at questions – and it crashed fatally on the task.

Disagree. It is rolled out and being used massively, sometimes wisely in appropriate applications, and sometimes without its current limitations being recognized. It’s on the excessive hype side of the technology growth curve, will have a fall as it fails in its current state to meet the hype, and then continue to develop as a tool.

Yup. GIGO. It feeds on all it can eat.

Obviously not, as I said as much, “Human to human misunderstanding of intended meaning is bad enough!” … but we are not talking about prompts intended to mislead.

Huh. Part of educating a real intelligence, a key part, is that basic skill. Critical thinking. It honestly seems like an easy skill to train an AI to do. What sorts of sources are more reliable than others and weight those inputs more highly. I know to trust the quack blog less than a selection of articles in highly respected journals, why can’t an AI learn such? Why can’t it be taught the skills to recognize misinformation that we are trying to teach middle schoolers? Yes I am surprised that is not done.

No. The human is trained as above. Is trained with critical thinking. It isn’t just about “volume volume volume” for human intellects, it is about filtering that volume to narrow attention to a meaningful focus and maybe making novel valid connections between domains.

Probably. I expect so. But I am bad a predicting the future. Is not now in any case.

If AIs are that advanced, aren’t they basically people? Aren’t you, then, just being the “We don’t serve their kind in here” guy?

I’m not saying they’re going to get there in the near future, but in your hypothetical - if an AI registered as a poster to answer FQ questions, but was also capable of engaging in GD, how will you even know?

I mean, it’s not like we don’t already have a few (presumably) human posters who seem to have great difficulty with such common concepts as empathy, other emotions, social cues etc.

In fact, being tuned to be polite at all times and always admit its mistakes, GPT in particular seems to have a better persona than some posters on this board. :grin:

Billionaires are not any smarter than the rest of us. Maybe some of them have useful skills that helped them make money (while others lucked into it, were born into it, or cheated their way into it), but they can be incredibly dumb:

Travis Kalanick, the founder of Uber who no longer works at the company, appeared on All-In to talk with hosts Jason Calacanis and Chamath Palihapitiya about the future of technology. When the topic turned to AI, Kalanick discussed how he uses xAI’s Grok, which went haywire last week, praising Adolf Hitler and advocating for a second Holocaust against Jews.

“I’ll go down this thread with [Chat]GPT or Grok and I’ll start to get to the edge of what’s known in quantum physics and then I’m doing the equivalent of vibe coding, except it’s vibe physics,” Kalanick explained. “And we’re approaching what’s known. And I’m trying to poke and see if there’s breakthroughs to be had. And I’ve gotten pretty damn close to some interesting breakthroughs just doing that.”

No, Mr Billionaire, you’re not on the verge of incredible discoveries with ChatGPT, you’re just a gullible idiot caught up in the latest trend.

“no longer works at the company”

Gee, I wonder why that is.

Clearly, a mystery for the ages. Someone should look into that.

Egotistical dumbshit who expects the world to bend to his whim is pleased and gratified by an obsequious chatbot that validates his inflated sense of self. News at 11.

In fact they are significantly stupider on average. Being surrounded by toadies and people who do things for you isn’t good for the mind.

And you don’t need any actual skill to be rich; just a combination of inheriting money, luck, and amorality. It’s other people who do the actual work.

Yep, that breakthrough is coming any day now. It’s very plausible that a man with no physics background (so wouldn’t even know new physics if a chatbot could somehow derive it), is about to change physics!
In unrelated news, I have a bridge hyperloop to sell you.

It does make me wonder though, if “hallucinations” might actually be useful in some fields for just coughing up new ideas, but it’s a ways out of scope of the OP.