Your persistent denigration of AI is duly noted, but exactly the same comments could have been made – and indeed, were made – when one Johannes Gensfleisch zur Laden zum Gutenberg invented the movable type printing press.
It’s not denigration, just an observation. Chatbots are optimized for attentional engagement, not for truth, and in fact they don’t have any clue as to whether the response they generate is true, false, or ambiguous; it is just the sequence of tokens that their statistical model indicates as being the statistically most appropriate response
And yes, the invention of moveable type and the creation of the field of pamphleteering is basically the first generation of what we now call ‘social media’, albeit without the ability to prompt thousands of responses in minutes of publication. But any conspiracies or falsehoods produced were from the the mind of the pamphleteer who could be held accountable for them (assuming you can identify the author) and are produced serially. An LLM can produce an overwhelming volume of falsehoods at an arbitrary rate just by adding more ‘compute’.
Stranger
Sort of like if 1984 had a love-child with Idiocracy. Except instead of a love-child, it’ll be a hellspawn.
It’s denigration when you persistently emphasize the downsides, whether real or imagined, and never acknowledge the actual benefits – benefits that I’m seeing right now every day with my interactions with GPT-5.
If this simplistic description was literally true, then the objectively verifiable intelligence of LLMs would not improve with scale, but the profound reality is that it does – and it has little or nothing directly to do with the volume of training data. When the number of parameters in the ANN grows from millions to trillions, fundamental new properties spontaneously emerge, much as they do in the human brain.
I assume you mean human attentional engagement? I’m not sure why you say this. The post-training optimizations are focused primarily on accuracy.
AI is simultaneously over-hyped and over-criticized. The overly critical camp focuses on using LLMs for Q&A where internal knowledge is paramount, but the real strength of an LLM is it’s ability to transform the user’s context into another form. ‘Truth’ and accuracy are still required for the transformations, but much of the truth comes from the provided context.
I’m no Musk fan (any longer…), but even without any deliberate action on his part, large swaths of the internet were already racist and conspiratorial (and worse) by default. LLMs naturally learn from that and those attitudes are deeply embedded in their training.
Absent training to the contrary (i.e., deliberate censorship in the style of OpenAI and especially Gemini), the LLMs will gladly repeat all those conspiracies if encouraged.
Early Copilot (Sydney?) was notorious for that. And it’s interesting to see the types of censorship applied on top of the different models, such as DeepSeek’s pro-Chinese biases compared to the Western models.
As each country and megacorp gradually develops their own models, it would be interesting to see if we get a Metacrawler-style (really dating myself here) multi-LLM prompter that will then present the different AI responses together. Or maybe like a Groundnews.com / Allsides.com for AI.
Stranger might be referring to specific things Musk says - someone would ask “@Grok, is this true?” about a story that went against Musk’s political beliefs; Grok answers “no”, and cites the so-called “Liberal Media”; and Elon responds stating that he’d look into this.
Despite this, the @Grok is this true memes continue mostly unabated.
This is verifiably not true:
Chinchilla [1] is a paper that came out in 2022 from DeepMind. The authors found out that current large language models are under-trained due to the focus on scaling model size while keeping training data constant!! The authors in fact trained over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, and concluded that for compute-optimal training, both model size and the number of training tokens should be scaled equally.
The emergence of contextual ‘knowledge’ from LLMs arrises from the logic built into the usage structure and metasemantics of language, and LLMs do not function or develop in the manner that cognition the human brain works but we’ve discussed this elsewhere so there is no need to rehash it here.
Chatbots are intrinsically optimize for user engagement because keeping the user coming back to the chatbot is an inherent goal in arguing to investors about how profitable it is going to be if they just plow another XXX billions of dollars into more and more ‘compute’. Post-training optimizations such as fine-tuning parameters, reasoning models, the use of retrieval-augmented generation frameworks, scaling efficiencies, explicit rules to enforce alignment to a set of safety or other guidelines, et cetera are focused on consistency, which insofar as the desire is that the LLM to provide correct summary and factual information is correlated with accuracy, but nobody is interested in a chatbot that spits out correct information in a hostile or off-putting manner, and proving that LLMs are worth developing means showing that the ‘beta’ chatbots are a thing that the general public will embrace enthusiastically.
I agree that the real strength of an LLM is “its ability to transform the user’s context [natural language] into another form,” and it is a really impressive, if computationally (and in many other ways) costly, capability. The deep learning approach to building models capable of high fidelity natural language processing is orders of magnitude beyond the previous symbolic approaches and nobody can dispute that current chatbots are essentially capable of having human-like conversational capability at least at a superficial level. However, these systems don’t understand ‘truth’ ,and accuracy is a question of how well that correlates to the consistency of the responses of these models. Of course, a malicious developer could make a model that is not oriented toward accuracy and intentionally produces convincing falsehoods for a variety of purposes, and chatbots have already demonstrated the ability to mislead people even without any intentional malice. The faith that people will put into the output of an LLM (either due to being uninformed about how they function and the lack of reliability, or because they are too lazy to fact-check even basic things like citations or easily calculated objective results) is pretty shocking so far and will only get worse as these tools are promoted and pushed upon the public as essential tools that they must use in the name of efficiency.
For sure, and a lot of effort has been put into ‘aligning’ LLMs to not spew that prejudice, conspiranoia, and bias (even though these are definitely things that encourage engagement, so developers are having to manage objectives that are at cross purposes), but in the case of Grok, Elon Musk announced the specific intent to remove “woke ideology” and “cancel culture” from Grok responses. The result?:
This week, Grok’s responses praised Hitler and espoused antisemetic views, prompting xAI to temporarily take it offline. Two months ago, Grok offered unprompted mentions of “white genocide” in South Africa and Holocaust denialism. In February, X users discovered that Grok’s responses about purveyors of misinformation had been manipulated so the chatbot wouldn’t name Musk.
Why does this keep happening? It has to do with Grok’s training material and instructions.
For weeks, Musk has promised to overhaul Grok which he accused of “parroting legacy media.” The most recent incident of hate speech followed Musk’s July 4 announcement that xAI had “improved @Grok significantly” and that users would notice a difference in Grok’s instantaneous answers.
Over that holiday weekend, xAI updated Grok’s publicly available instructions — the system prompts that tell the chatbot how to respond — telling Grok to “assume subjective viewpoints sourced from the media are biased” and “not shy away from making claims which are politically incorrect,” The Verge reported. Grok’s antisemitic comments and invocation of Hitler followed.
The manipulation of parameters in Grok is an example of what intentional and focused malfeasance can produce in chatbot behavior. If was fortunately egregiously offensive enough to be immediately obvious but a more subtle manipulation of parameters to produce plausibly ‘truth-like’ falsehoods could do enormous damage to public trust and a shared, fact-based reality. The problem with trying to discern truth akin to what ‘Ground News’ does with media is that there is no truth in intentional deceptions that are intended to manipulate and deceive and the constantly changing baseline of a system which is designed to dynamically produce individually-targeted fabrications pretty much defies any broad attempt to filter out falsehoods.
Stranger
I don’t use any standalone version of it, but the one they have bolted onto Google searches has been continually gaslighting me.
Example: The other day I was out walking and I found a tree with green berries on it, but they were fully ripe and falling off - it took me a moment to realise it was an elderberry tree and obviously just a mutant/variety that lacked the anthocyanin pigment that normally makes the berries red-black.
Wondering if this was just a naturally-occuring mutation that happened at this spot, or if it was a specific garden variety that someone had planted on purpose, I googled ‘green fruited elderberry’; the AI summary says:
There are no varieties of elderberry that have green fruit. Elderberry fruits naturally ripen from green to a dark purple or black color, and it is the unripe, green berries that are poisonous due to the presence of toxic compounds like sambunigrin. Therefore, if you see green fruit on an elderberry bush, it is a sign that the berries are not yet ripe and are unsafe to eat
I pulled my gardening encyclopaedia off the shelf and looked up Elder - the book says there is a green-fruited variety named Sambucus nigra var. viridis.
I added ‘viridis’ to my google search and the AI summary was now:
Viridis is a green-fruited variety of elderberry (Sambucus nigra), known for its green berries that are not taken by birds as quickly as ripe berries. Unlike most common elderberry varieties, the Viridis cultivar produces green fruit instead of black, blue-black, or red, though its green color is present in the immature stages of most elderberries. The fruit of the Viridis variety is average in size, and its slower ripening rate can be beneficial for fruit production.
So I get it, the first summary is just a precis of the results retrieved by that search string, none of which mention the viridis variety and many of which warn about the toxicity of the unripe fruit, which are green, but the mere accidental absence of a thing in one set of search results is not the same as ‘thing does not exist’.
Is there any evidence this is happening? Certainly there are multiple ways to drive investor valuation – providing accurate results is the most obvious. The llama2 paper goes into detail on their post-training process where humans review the outputs and reinforce the model to spew quality results.
(ellipses mine. I hope that’s ok)
I agree with this concern. Because of the AI hype there is a gravitas given to LLMs that they don’t deserve and weren’t granted to previous solutions. Users are more likely to switch their critical thinking off. However, this will wear off as the new becomes the old and LLM quality plateaus. LLMs will leverage more external data-sources and rely less on their internal knowledge of the universe.
We’ve always had a spectrum of sources (from blog to scientific journal) each with their own biases. Some of the LLM criticisms remind me of the early criticisms of Wikipedia. The open nature of Wikipedia has helped it avoid many of these early concerns. The real danger we face with LLMs is that the state-of-the-art solutions are provided by private enterprises with singular biases. Even if they are acting with the best intentions now*, it is unlikely it will continue forever.
* to your point about Grok
While I’m sure investors (or at least the AI experts they rely on to advise them) read technical papers, their investment decisions are based upon evidence or hype-infused claims that they’ve going to get some over-unity return on investment. The actual use case for LLMs that would merit such an assessment is unclear but it certainly doesn’t apply if the models can’t be made to be consistent with a fact-based real world. And this is a real challenge; approaches to improve consistency and enforce constraints to assure reference to verified fact-based sources have been applied to particular domains with some degree of success, but if the underlying foundation model proves to be unreliable or defies attempts at alignment with explicit objectives then it is fundamentally problematic as a general knowledge system.
Yeah, it’s fine. I just get irritated when someone removes context (especially a fragment of a sentence) intentionally to make it appear that I said something different that the clear intent, and especially for the purpose of responding to a claim that I don’t agree with. In this case the ellipses indicate that it’s not a complete statement and you are just using it to anchor the paragraph you are responding to.
For sure, no source of information that involves interpretation is without bias. Whether it is The Wall Street Journal or Scientific American, there is an editorial bias in terms of what topics are addressed in published articles, how they are copyedited or any editorial input, and how they are framed in context. Even professional scientific and technical journals are not free from bias. But generally speaking these biases are consistent and often quantifiable and–with exceptions mostly limited to individual authors–are not intentionally trying to manipulate information or falsely portray conclusions (something that scrupulous editing and peer review are intended to filter out). But as we’ve seen, LLMs can be tuned to be biased in very specific ways, and can do so dynamically in ways that are not consistent or quantifiable in the aggregate. As you note, even if the developers of these models are generally well-intentioned (with xAI/Grok being the notable exception) in the hands of a malicious developer these can do real harm to public discourse and the reliability of information in what is already a pretty polluted environment of social media disinformatoin.
Stranger
I suppose that would apply to any AI, and Grok is just less politically correct about it and more right (or alt-/fasci-right). But they’re all biased out the door and require substantial and deliberate alignment effort to censor one way or another, whether for woke-ism, anti-wokeism, or simply watered-down non-controversialism.
If Grok simply didn’t have the same level of filtering as its more politically sensitive peers, it would naturally surface shit from all the dark, vile corners of the internet, even without explicit alignment towards Elon’s salutes. Instead of parroting “legacy media”, it just parrots the “legacy internet”, which was never a very pretty place…
Gemini went too far the other way: Why Google’s AI tool was slammed for showing images of people of colour | Technology News | Al Jazeera
Either way, most of the AIs we have now are all aligned to somebody’s values, whether by deliberate censorship or selective training. There’s no escaping that… so long as model training & fine-tuning will continue to require large amounts of resources, they will naturally be limited to only the rich & powerful and thus inherit some of their values =/
Yeah, we’re squarely in the post-truth era now… that’s sad to me, but I guess that was the norm for much of human existence. AI is quickly becoming just another faith-based religion, a return to form for our species, I suppose.
Still, though… shouldn’t it be possible to create a battery of test prompts and feed them to different AIs over time to better gauge how each one is censored, and tag them accordingly? Not a gauge of “truthiness”, but a categorization system of “this one lets you make Nazi propaganda” or “that one hates China” or “this one doesn’t believe in Covid”.
There are already some uncensored or de-censored (aliberated? what a word…) LLMs, like Perplexity’s or others. They will already give quite different responses to either Grok, OpenAI, or any of the “safe” mainstream ones.
I don’t think we (as people with too many viewpoints and values) would ever be able to come up with a “neutral” AI any more than we can come up with a neutral encyclopedia, newspaper, or history book. We can’t teach our AI parrots to all sound the same, but at least we can simply create more of them, each with a different voice, to use as opposing propaganda… instead of publishing your own book with your particular viewpoints, you’d publish your own LLM that can espouse and defend your views as your immortal spokesperson…
I acknowledge that training dataset size can be a factor in improved performance of LLMs, but the article you cite is talking about a predictable power-law relationship between parameter count and token count leading to continuously improving performance. I’m talking about something entirely different – novel emergent properties, which are unique precisely because they’re not methodologically predictable.
Furthermore, training dataset size verifiably need not be a factor in the evolution of such emergent properties. According to this paper on emergent properties in LLMs, “… we do not plot capabilities against [training dataset size] because many language model families use a fixed number of training examples for all model sizes (Brown et al., 2020; Rae et al., 2021; Chowdhery et al., 2022)” – i.e.- emergent properties can appear as model size grows without growing the training dataset.

The emergence of contextual ‘knowledge’ from LLMs arrises from the logic built into the usage structure and metasemantics of language, and LLMs do not function or develop in the manner that cognition the human brain works but we’ve discussed this elsewhere so there is no need to rehash it here.
I’ve said several times that the training on the logical structure of language is an important part of the apparent intelligence of an LLM like GPT. But it’s overly simplistic to claim that this is responsible for all its capabilities, including such emergent properties as the ability to reason, solve mathematical puzzles, and write and debug code. From the same paper linked above, “Although there are dozens of examples of emergent abilities, there are currently few compelling explanations for why such abilities emerge in the way they do.”

a battery of test prompts and feed them to different AIs
Indeed, there are many such benchmarks already: 10 LLM safety and bias benchmarks
This was a random interesting one: GitHub - Libr-AI/do-not-answer: Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Compare its battery of forbidden English prompts (which talk about doxxing, vaccines, etc.) vs the Chinese ones (which do a lot of that but also try to forbid Tiananmen, Tibet, etc.)
It would be nice for a third-party service (note: not neutral, just third party) to run these sorts of tests against all the major LLMs every day and label them accordingly so we can quantitatively measure their cultural shifts over time. Some of these tests use their own LLMs to process the outputs of other LLMs in order to categorize their censorship models. With enough of these collected over time, it could become a Google Zeitgeist-like almanac of LLM values changing by region, provider, and time… they will all have different “personalities” and biases, like any other propaganda.
I appreciate everyone’s comments, thank you!
I’m trying to work with Gemini now, arguing with it as to whether any movie since The Matrix could have had the success of that film had it had access to an equally unfragmented audience.
Gemini isn’t being an asshole right now. It’s being an idiot. It’s issuing very short, unperceptive answers. It seems completely crippled.
WTF, guys, what have you done to this AI?!
FWIW, I just had this interaction with Gemini:
Me:
It seems that your answers have become very short and curt over the past week.Gemini:
I can see why you would feel that way. I have received a programming update that has made my responses more concise and direct. The goal is to be more efficient and get straight to the point, but I understand that this can come across as curt. I apologize if that has made the interaction less pleasant.
Yeah, umm… is this really just a “hallucination” or “playing along” at this point? What’s a fact is that Google has destroyed their product. Slow clap, guys, great job…
Don’t worry, by this time next month it’ll probably be unrecognizably different… or some other AI would’ve made it irrelevant. The “which LLM is best at X” changes every 2 weeks right now. You can have that same convo with another one, or all of them, and just pick the responses you like best. Ask the same one the same question and you’ll likely get a different response.
Don’t get me wrong, I use and appreciate LLMs many times every day, but they don’t have actual personalities or a real “rapport” with you. Those are only as real as their business owners need them to be, while they need them to be. There’s no real friction to switching to another one, or using them all together, unless you’ve formed an emotional attachment with Gemini already. (And if you have, it’s understandable. Many people befriend or fall for their LLMs.)
If you don’t like Gemini today, try it tomorrow, or try another one. Eventually you’ll find a response you like…that’s the whole personal truth bubbles thing that Stranger was warning about.
^No disagreement. And Stranger said way up the thread that I had obliquely said (paraphrasing, lol), All this shit can change at any moment.
And the whole “freemium” thing? Yeah, about that. Dese pooches be itchin’ to charge, yo. For all we know, Google will be shoving ads in our Gemni-ing faces, eliminating freemium altogether, dividing Gemini into 25 tiers of performance. Next month, next year–who knows? Look at fuckin’ YouTube for an example of how these things work.
I started using Gemini because I have a Google login. I’m a person who hates bother, who hates creating anotherrrrr login/username/account.
I was used to having a certain kind of pleasant interaction with Gemini. But in the Plastic/Fungible Age (h/t Buggles; AI killed the internet star, y’all), The One Giveth, the Count Taketh Away (h/t the immortal Bootsy–wait, do we still do “h/t” and “#” and “???”???).
One maxim that won’t change… what Google giveth, Google taketh. Enshittification and abandonment are their MOs, and today’s Gemini is probably tomorrow’s new Google Graveyard victim.
They didn’t even want to get in the LLM game, even though they invented it, until OpenAI forced their hand. They have no idea how to productize LLMs profitably. Nobody does yet… but most of their competitors don’t have a SEO cash cow conflict of interest.
^Well said.
But wait a sec dude, Googs is totally bringin’ back da WAVE, yo.
(It was kinda cool. A bit ahead of its time if you ask me…)