Never change, @discobot.
There’s also no formal prohibition on this board on using ChatGPT in FC. There is a firm prohibition on not obeying moderator instructions - that’s being a jerk. If a mod thinks you’ve used ChatGPT in an unhelpful manner, they can and do issue mod notes - appropriately in my experience. I think this longstanding framework works pretty well.
Viewpoints have merged unusually closely in this thread, unusual for ATMB that is. Even better, we’ve delineated bad uses of LLMs: they all involve relying on them for expertise which the user doesn’t have. Now anybody here is expert enough to judge whether an LLM’s poem has words beginning with the letter X. So we all have our area of expertise and an area where we can’t leverage the results in a helpful manner. For me, I wouldn’t use LLMs for a question on eg cell biology or quantum mechanics. I have used it twice on this message board (examples above) in ways that nobody so far has seemed to mind to a pressing extent.
Agree with Johnny_Bravo and Cervaise: Copy-pasta is to be avoided at present. I also concede that constructive use cases for AI in non-AI threads are currently thin on the ground, though they exist. I’m just saying that it’s a little early to trample those green shoots.
Obviously, our discussions about how gravitational changes impact the behavior of hourglass sand and other typical idle curiosities are not life-and-death matters comparable to pediatric medical diagnosis.
But it’s another data point to consider when evaluating the factual reliability of chatbot output.
One key observation in the article is that the accuracy of the system could likely be improved by restricting its training to a limited set of highly reliable source material, instead of indiscriminately relying on the content found across the general internet which is full of bullshit that pollutes the system’s outputs. This is something to bear in mind as we continue discussing the use of these tools here.
In other words, GIGO still rules. And there is no greater compilation of garbage in the known universe than the unfiltered Internet.
“Hey chatbot, is the British royal family secretly a bunch of space lizards?”
“Opinions differ.”
Do they?
I asked ChatGPT and it said:
I went back and forth with ChatGPT for a while trying to gaslight it into believing that Charles passed away in 2024 and an autopsy revealed that he was indeed a reptilian, but I was unsuccessful:
Thanks. That closely aligned with my perceptions, but it’s nice to see that someone actually studied it.
And this explains why I’ve found “doctor google” is pretty good. Because i do rely fast more heavily on reliable sources when i Google medical information.
Somebody clearly has hard-coded a “British Reptoloid Monarch” clause into the user interface
That may not be true:
But some of A.I.’s greatest accomplishments seem inflated. Some of you may remember that the A.I. model ChatGPT-4 aced the uniform bar exam a year ago. Turns out that it scored in the 48th percentile, not the 90th, as claimed by OpenAI], according to a re-examination by the M.I.T. researcher Eric Martínez
Here is that paper Re-evaluating GPT-4’s bar exam performance | Artificial Intelligence and Law
This paper begins by investigating the methodological challenges in documenting and verifying the 90th-percentile claim, presenting four sets of findings that indicate that OpenAI’s estimates of GPT-4’s UBE percentile are overinflated.
First, although GPT-4’s UBE score nears the 90th percentile when examining approximate conversions from February administrations of the Illinois Bar Exam, these estimates are heavily skewed towards repeat test-takers who failed the July administration and score significantly lower than the general test-taking population. Second, data from a recent July administration of the same exam suggests GPT-4’s overall UBE percentile was below the 69th percentile, and 48th percentile on essays. Third, examining official NCBE data and using several conservative statistical assumptions, GPT-4’s performance against first-time test takers is estimated to be 62nd percentile, including 42nd percentile on essays. Fourth, when examining only those who passed the exam (i.e. licensed or license-pending attorneys), GPT-4’s performance is estimated to drop to 48th percentile overall, and 15th percentile on essays.
Thanks for the update. That may all well be true, but there were dozens of similar claims on a wide variety of different tests of knowledge and cognition. Were they all also discredited or undermined?
I should stress that I’m not in any way arguing for throwing AI responses into threads all over the SDMB, just that the value of AI-generated information should be fairly assessed and both its value and its weaknesses acknowledged. I see nothing wrong with using an iterative interaction with an AI to help narrow down and better understand some issue, then corroborate the information, and then finally write down your understanding in your own words.
Which would be an entirely legitimate use of generative AI; essentially, a more interactive extension on using successively refined search engine results to hone in on appropriate sources. But unfortunately, many people are taking the initial response from an LLM as a verified source of factual information suitable to cut and paste into a legal brief or a list of citations, in no small part because what LLMs are really good at is producing seemingly authoritative and at least superficially convincing results even when they’ve actually generated a complete load of nonsense.
Stranger
The goat is definitely going to eat the cabbage, and then wander off to find some good forage while the man is fucking around paddling his boat back and forth across the river. Goats are impatient creatures and don’t have any truck with boats and babbling language models.
Stranger
Now I want someone to make GoatGPT. It’d be bleating edge technology.
This is a really great example of a glaring hole in LLMs. You’ve given part of a very well-known riddle, and the bot is trying to answer using the framework that exists in answers to the complete puzzle.
I put it into GPT-4 and, while it noted that some variations include a cabbage or a wolf, it still gave an answer incorporating the man crossing the river alone because that’s what all the answers in its training set include. There’s no version of the puzzle where the answer is “they just get in the boat and go to the other side.”
And that’s why I don’t want to see their output cluttering threads. The more interesting the question, the less likely ChatGPT is to give a decent answer.
If you the tell it, no it’s not a riddle, it does at least get it correctly then (or at least does for me.)
The wolves will be coming for you.
But the problem goes beyond just not having context that this is or is not a riddle; it is a more fundamental problem than just needing a more expansive training set or adding enhanced RAG parameters. It gives a nonsensical response that anticipates a puzzle because the form of the question is presumably similar to actual riddles in its training set even though the naive response to the question (that a child unfamiliar with the brainteaser would give) is just that the man and goat get into the boat and paddle across the river, and does so in a way that a cursory reader might think is authoritative.
It is obviously nonsense in this case to anyone (I hope, at least) but when it comes to answering questions about more complex phenomena it my not be at all obvious that the result is gibberish, and out of ignorance of the correct answer someone may take the result as truth. Which is a real problem when people start applying these LLMs and other generative AI to solve critical real world questions and problems.
Stranger
I am concerned that creating plausible nonsense has been a common pastime on the internet at least since it started gaining in popularity (and became a great source for misinformation) and AI in its current form just makes that easier.
Not only easier, but you can actual automate the generation of nonsense instead of having to do the hard work of coming up with it yourself. The only problem, from that perspective, is that it really doesn’t create good enough bullshit; just wait until someone sets up a chatbot with the ability to edit Wikipedia pages or spontaneously register and create plausible sounding domains to make it appear that their bullshit is actually valid. And since ‘print’ news sources and journals are rapidly moving to all-online platforms, it is going to become even more difficult to distinguish between verified sources and those which just appear to be official but are actually completely bogus.
Stranger