ChatGPT Breaking up marriages

Here’s what ChatGTP gave me:

Summary

I can translate that into Yiddish, but I’ll tone down the profanity just a bit, since direct vulgarities don’t always map well into Yiddish idiom. Here’s a faithful but natural version that keeps the emotion and tone:

„צייל שוין די ורפֿל, צום ביישפּיל! רבונו של עולם, טשאַט־גפּט, אמתדיק?!“

Transliteration:
“Tseyl shoyn di vurfel, tsum bayshpil! Riboyno shel oylem, ChatGPT, emesdik?!”

Literal meaning:
“Just count the dice already, for example! Lord of the world, ChatGPT, seriously?!”

Hmmm. The implied equivalence between Ribono shel-olam or “Lord of the World”, a conventional Yiddish expression for the Deity, and Jesus Christ raises my eyebrow a bit, but I can see how it could fly as “loosely colloquial translation of commonly encountered exasperated invocation of the Divine as generally personified in this linguistic tradition”.

Ummm… both. Hebrew in the first and last part, English in the middle.

I’m not sure it’s a New Yorker problem.

They could have translated it as Yoshke Ben Pandera I guess but then you have to wonder why anyone would ever exclaim that.

That last sentence is a ridiculous oversimplification. Here is an article – already obsolete because it talks about GPT 3.5 and 4 – but it lists some of the impressive professional exams GPT has passed, in many cases acing them, like scoring beyond the 90th percentile of human subjects. This is a comprehensive technical report on GPT 4, with a full list of all the exams that it’s passed and its scores, starting on page 5. FWIW, OpenAI claims that GPT was never explicitly trained on any of these tests – it was going from general knowledge.

Some of its accomplishments were impressive; for example, in the Wharton School of Business exam, GPT was presented with a case study, and based on the learnings from it, was asked how it would handle a business problem that was substantively different. It’s the kind of question that challenges human subjects because it requires the ability to generalize and apply a learned abstraction to a new problem. It’s also generally very good at solving problems that have to do with proportions and relationships, the kind that require you to set up and then solve a system of equations. Any of those is a far cry from “just comparing probability values for different possible choices of words”; they are literally the kinds of questions used to measure IQ.

As I said later, using more precise terminology:

A post on Twitter recently showcased a medical AI’s uncanny ability to diagnose via image analysis, a floor of the mouth cyst.

The image in question and the correct answer can be viewed here, for those with Twitter access (scroll down a bit for the AI’s “answer”). I agree, it’s a tough one. :thinking: :rofl:

But it is hilariously disturbing.

Modern Hebrew, thanks to its history, has very few swear words of its own, so it’s happy to adapt curses and profanity from any and all languages. Arabic, Yiddish, Russian, English, it’s all good.

“Jesus” and “fucking” are practically Hebrew at this point. “Jesus fucking Christ”, maybe less so, but hey - I’m bilingual.

“First they ignore you, then they laugh at you, then they fight you, then you win.”

Not specifically about AI, it’s attributed to a speech by trade unionist Nicholas Klein in 1918, but it seems to fit here. AI was largely ignored by the general public as it quitely evolved since the early 1960s, and has suddenly become the topic of hot debate now that Large Language Models have burst on the scene. The debate seems to vary between apocalyptic fear (“AI will be the end of civilization as we know it”) and endless mockery (“they’re just stupid sentence completion engines that don’t understand anything”). So we’re somewhere between mockery and the desire to fight it, depending on who you listen to.

My attitude? Neither of those extremes. I regard GPT as a somewhat opinionated smartass but congenial friend who thinks he knows everything but sometimes doesn’t, but is nevertheless extremely informative and useful. I think it’s overall a net benefit for the world. It will probably find many applications as a natural language front-end, in many cases supported by specialized back-end information providers. LLMs are a really important advance, but certainly not the Holy Grail of AGI.

I was going to say that my favorite homegrown swear word is the way that Israelis call people “Cholera”, but apparently this habit came to Hebrew via Polish:

That pretty much just leaves “Carcass”.

Sigh. These aren’t mutually exclusive characteristics. Yes, the LLM is using a sophisticated and very, very complicated deep-learning model to repeatedly refine its initially constructed answers in order to maximize the probabilistic matches between its answers and the data it’s been trained on. And yes, this is accomplished fundamentally by comparing millions and millions of numerical weight values on the tokenized word vectors to solve optimization problems.

This is how an LLM is often able to give correct answers to questions that are different from the questions it’s been explicitly trained on. It’s using the coordinates of the word vectors in the prompt, and other word vectors in its data, to determine which words are more semantically similar, and which have more opposite or orthogonal semantic import. So it doesn’t need to have processed any specific combination of words, or any specific prompt about those words, in the past in order to be able to come up with an answer to the prompt that achieves the numerically best alignment with the totality of its training data. The LLM is working a gradient descent algorithm (over and over again, on millions of variables), in a very operationally complex way, to be sure, but there’s no particular reason to call that “thinking”.

I am not in any way, shape or form dissing the impressiveness of the achievements of LLMs in their ability to simulate human communication. All I’m saying is that this simulation is achieved by means of specific mathematical algorithms (with incredibly dense layers of variable values and processing iterations) whose essential nature is not mysterious, or identical to what human brains do. Lumping them together is comparing apples and oranges.

This isn’t mockery—at least not on my part—and you don’t need to be so defensive on behalf of your “congenial friend”. Just because an LLM is following (extremely complex) programmed instructions to produce its output, which it can do without actually “understanding” the input in the usual sense of the term, doesn’t mean that its outputs aren’t, as you say, “extremely informative and useful”.

If by ignored, you mean misunderstood, sure. Otherwise, I disagree. The general public is aware of many things that fall under AI, it’s just that those in the know used more specific descriptors so the public didn’t realize that there was a high level descriptor covering all of the various methods and models. Netflix recommendations, Siri, non-human chatbots, targeted ads, sports analytics, image recognition, and many others have been around much longer than ChatGPT. Most people had even heard of things like “neural nets” and “reinforcement learning”. Whoever is to blame and for unknown reasons, LLMs became “AI” instead of “LLMs” in the general public. Hell, I’m late to the party compared to many of my contemporaries, but I was doing this before OpenAI was founded. Even the infamous “Data Science is the Sexiest Job of the 21st Century” article in HBR predates OpenAI.

I’ve heard the argument that LLMs are at least a synonym of AI, therefore it was perfectly fine. Sure, if “3” is a synonym of “the set of integers”.

So, no, AI wasn’t largely ignored by the general public, they just didn’t (and still don’t) know what the term really refers to.

As has been addressed in many previous discussions, the “…emergent properties that manifest at the unimaginably large scale of many billions of parameters shaping the behaviour of their artificial neural nets,” are not some evidence of some underlying deep comprehension of the world or any kind of sapience, but rather an artifact of accurately emulating the structure and usage of language. There are nearly infinite permutations in the ways in which words can be assembled into sentences, and sentences into paragraphs, which meet the rules of grammar and writing structure, but only a tiny subset that are actually evidenced in human writing because putting random words in grammatically and syntactically correct order would produce prose that is as semantically nonsensical as Lewis Carroll’s Jabberwocky. A heuristic system trained on the vast corpus of human writing and capable of modeling those structures will naturally produce seemingly coherent statements in response to query just because of the logic and metasemantic structures built into how language is used, as computational linguists have understood for decades. This does not indicate any kind of deep comprehension of the world but just that for any reasonably coherent prompt there is a narrow range of an appropriate responses. That more recent versions of LLM-based chatbots (especially those utilizing ‘Chain-of-Thought’ recursive ‘reasoning models’ which break down the process of constructing a response into smaller chunks) can produce reasonably good answers to more complicated questions shows that they are very good at manipulating language in a way that is representative of their training data but it doesn’t mean that they have ‘knowledge’ of the real world much less actual cognitive processes for which there is absolutely no evidence nor any mechanisms to support.

That LLM-based chatbots do “a pretty good job of acting like it possesses understanding and analytical capabilities” is really the goal, and also the problem, of the entire business case for chatbots. Nobody––much less large corporations expecting these systems to supplement and ultimately displace grubby human workers in many roles––is going to accept agents that don’t present as confident dispensers of knowledge and analysis, which means not only are these systems not really capable of understanding the factuality (or not) of their responses, the ‘tech leaders’ driving the development aren’t even that motivated to align them toward being factually accurate versus just sounding really authoritative. Furthermore, because the current focus for these systems is to make them ‘public facing’ in order to maintain enthusiasm (because they aren’t very good for producing anything of critical value like writing accurate legal briefs or authoring textbooks) makers are actually primarily motivated to make the systems optimized for engagement; to provide outputs that provoke an emotional response that causes the user to keep engaging with them. This isn’t any kind of incidental consequence; it is a deliberate objective as seen in the leaked documents from Meta in the Reuters article linked above.

This isn’t to say that ChatGPT and other broadly trained LLMs don’t have a lot of information which has been integrated into their massive artificial networks (trained, it should be noted, with many orders of magnitude more data and energy than the standard 25 watt brain of the human user with just a few dozen person-years of comparatively snail-like absorption of textual and image data). But their ability to contextualize the data is not based upon any kind of models formulated by experience of the real world; it is entirely dependent upon the statistical organization of words and their implicit associations and metasemantic ‘rules’ represented in the training data plus whatever reinforcement training has been used to attempt to prune away ‘wrong’ associations. These systems fundamentally have nothing akin to ‘common sense’ in determining whether a statement makes sense or not which is why they still so often produce errors, sometime subtle but not infrequently completely and laughable wrong in ways that even a child with their very limited experience of the world would immediately recognize as nonsense.

This is a bit ironic because ‘behaviorism’ as a school of understanding the psychology of humans and other animals is considered a largely outmoded and unreliable way of predicting responses and estimating intelligence, in part because it often invokes the intrinsic biases of the observer in the interpretation of what a behavior ‘means’. It certainly has no place in cognitive neuroscience where it is understood that observed behavior is the endpoint of extremely complicated, highly distributed, and somewhat mysterious ‘intuitive’ processes that are built into the functional structure of animal brains (and other parts of the nervous system) with no analogue in LLMs. The “emergent properties that occur at very large scales of computational complexity,” are not only a consequence of emulating the complex semantic systems of language (with, again, their own logic and metasemantic rules built into the structure of usage) but also an enormous amount of active training, trimming, tuning, and filtering to ensure that these systems don’t immediately start producing complete gibberish (as early LLMs did). Humans and other animals naturally ‘learn’ through their experience (in the appropriate social context) in ways that seem to naturally produce appropriate behaviors with a very limited amount of active intervention (most of that in ways of learning the artificial rules and guidelines of society, such as not peeing on the rug or sitting upright at a table) whereas LLMs have to be rigorously controlled to ensure that they don’t develop undesirable and adverse behaviors like responding to a prompt with racist diatribes (except it the case of Grok, where apparently the leader of the company encourages that kind of thing).

This isn’t to say that artificial intelligence using artificial neural network (ANN) ‘deep learning’ heuristic systems aren’t impressive, useful, and probably even critical for solving certain types of problems. AlphaFold has already demonstrated its utility in simulating protein folding and interactions that have defined human researchers using explicit solutions; other systems are used to process through ‘Big Data’ problems from genetics to astronomy to find patterns and phenomena that a human researcher could never find. These purpose-developed AI systems solve particular problems by use of ‘brute force’ computation because there isn’t a better way to do them. Similarly LLMs have ‘solved’ the problem of parsing general inputs in natural language effectively which is something linguists and computer scientists had been trying to do through symbolic approaches for decades with only very limited success in narrow domains. That is pretty amazing in and of itself, but while they’re really quite good at breaking down prompts and providing syntactically competent (if often factually erroneous) responses, this doesn’t make them reliable general knowledge systems even when paired with the ability to retrieve actual factual information from verified sources.

Since we’re discussing individual perceptions and experience, I’ll contribute this anecdota: I regularly review reports from contractors presenting the results of engineering and scientific analyses or positions based upon ‘engineering judgement’. Many of these contractors have clearly been encouraging their workers to use AI to ‘help’ draft these products in the name of efficiency. I can tell when I get something that was generated by an LLM-based chatbot––usually by the introductory paragraph––because first the grammar is always a little too perfect for someone who isn’t a professional copyeditor, and second they almost invariably include information and explanation that is erroneous, incomplete, misleading, or just really oddly presented in a disjoint fashion, like someone who is trying to summarize a textbook chapter or long Wikipedia article without actually digesting the information. In one case it produced a table (which I didn’t realize that ChatGPT or Copilot could do) with what appeared to the correct data but confusing labels and utterly nonsensical units. Of course, nobody bothers to carefully review and edit these products because the entire point of using a chatbot is to ‘save time’, and so we get reports or papers that are filled with errors and inconsistencies which certainly doesn’t ‘save time’ for me or other people reviewing them. I’ve actually come to a rule that when I discover that the document was authored by a chatbot I stop after noting the first three or four serious errors (usually within the first page) and send it back to from whence it came with pointed comments about how obvious the errors are and that any AI-generated product should be thoroughly reviewed before submission. This has made a lot of people very upset about how disruptive this is and that I’m not doing diligence in reviewing and commenting on the entire document but if someone is going to just crank out some AI slop and call it their own work without even doing a quick pass through it, I’m not inclined to waste my increasingly precious time on this planet ‘correcting’ a document generated via statistical word association. And I’m not some kind of oppositional ‘luddite’ who just won’t look at a new way of doing things; I’ve been using machine learning methods for over fifteen years, have had an interest in developments in machine cognition and AI research since college, and think that a property constrained LLM could be quite useful as a natural language interface in numerous applications, but they way it is being deployed and the lofty promises/threats of AI promoters hawking the oncoming revolution with a complete lack of credibility is galling and frankly worrisome for someone who has seen tech bubbles before and recognizes this as being the Big Momma of overinflated expectations and adverse consequences.

Claiming “extremely informative and useful,” is a subjective evaluation depending upon what one considers informative and useful, but ranking LLMs as “overall a net benefit for the world,” is a hard sell for something that is often wrong or misleading, aligned to emphasize engagement over practical utility and factuality, compromising collective intelligence and undermining democratic norms based upon a shared fact-based understanding of the world, and also using enormous amounts of energy and fresh water to support needed ‘compute’ to train and operate as well as creating an entire industry of low paid humans in developing nations to filter data and actively train these models, often at peril to their emotional health and well being, not to mention what the end result may be to humans in the workplace who are being forced to adopt these technologies in the name of questionable efficiencies whether they actually produce a better product or not. I keep hearing how these things are going to cure cancer, solve world hunger and conflict, and fix global warming, when in reality we have all the knowledge we need to do these things for ourselves if we simply had the collective willpower to implement them, and also not requiring expending progressively more energy sufficient to threaten reversing gains made in transition to renewable energy sources.

When (and if) artificial general intelligence (AGI) will occur is an open question but LLMs by themselves are almost definitely not an avenue toward that, not only because of their inherent costs and data requirements to get them to even be as marginally functional as they are currently, but also because there are no actual processes of cognition or any workable analogue of such going on within them. They are a really sophisticated prompt-and-response algorithm developed with heuristic training on a massive corpus of data, but they don’t ‘think’ even as much as dog or a turtle. The increasingly hyperbolic claims of AGI being just around the corner in 2024 2025 2026 (maybe 2027 but definitely before 2028) are just more and bigger hype to justify an industry that can’t possibly realize either the valuation or the amount of capex they require to be poured into ‘compute’. It’s like tulip mania, only if tulips were being promised as the solution to make everyone rich and happy with full bellies and frolicking puppies all day long.

Stranger

Here are some examples of this. When I took an AI class at MIT in 1971, we studied AI as applied to visual recognition (done), chess (done,) understanding a block world (done,) mathematics, specifically doing integral calculus (done) and planning of routes (done by Google Maps.) So the AI of yesterday is the cellphone app of today. I don’t think anyone would think there is intelligence behind the route you get on your phone, and I hope when people understand these things better they won’t think there is real intelligence behind what ChatGPT produces.

The more interesting question is whether there is real intelligence behind what most of us produce.

Yeah, but what did that guy know? Nothing!

An important emergent property that occurs at very large scales of computational complexity is: hallucination.

I agree that appreciation of this fundamental fact is absolutely critical to developing the appropriate level of respect for LLMs.

They should be shut down until they can deliver guaranteed correctness before convincing delivery.

As presently constituted, highly convincing but with utter disregard for accuracy, they are the ultimate “fake news” generation vehicle about absolutely everything, not just current events. They are fast polluting the entire body of human accumulated knowledge and run the risk of destroying us all. Not by taking over and declaring humans superfluous, but rather by overwhelming usable truth about the world with mere nonsense.

As we often say, reality has a reality bias. Stuff designed using what passes for AI’s LLM’s “facts” will not work. Applies to machines, software, and certainly to public policy and mass public behavior.

You’re only addressing purely text/language based LLMs here. You’re completely ignoring Multi-Modal AI, which obviously does have a wider scope than language (I won’t use the term “understanding”) since it can process images and videos, not just language.

@wolfpup is right that much of the criticism of AI focuses on the state of chat bots from 3 years ago rather than state of the art models.

I mean, totally off topic, but why do Hebrew speakers not have an extensive set of swearwords?

My native language, English, has plenty. My home country’s language, chiShona, has quite a few. My adopted country, South Africa, appears to have language whose basic purpose is swearing - or the least, has elevated swearing into a fine art.

My favourite Afrikaans so far is “jy is laer as 'n slang se skote se skaduwee”, which is "you are lower than a snake’s shit’s shadow.

I would have thought Hebrew, which is somewhat famous for turn of phrase, would be equally creative.

Oh my God, I was typing from my phone and somehow managed to insert an entire non-word into that sentence. Eleextenal? What the fuck?

As for not having too many swear words: I’m sure that ancient Hebrew had a very healthy array of slurs (although I’m sure even then they were borrowing from Greek and stuff). But during the time when Hebrew was not really used as a vernacular, cusses weren’t really well preserved by the use of Hebrew in discussing Torah and the Talmud. Languages like Yiddish and Ladino had plenty of swears, but often those drew more heavily from the non-Hebrew side of the language.

Withe the revival of the language by Eliezer Ben Yehuda and others, coming up with ways to curse wasn’t exactly a priority.

So aside from words like “Bastard”, “Carcass”, “Penis”, “Son of a Bitch” etc - words that are pretty obvious insults and biblically preserved - you mostly end up with loan words, like “fuck” and “shit” from English and “Your mom’s vagina” from Arabic.