To be clear, I used the term “gibberish” to describe the specific response in this post as it was, in fact, totally wrong. I upgraded my evaluation of @Sam_Stone’s response from GPT-4 to “reasonably talented magpie” because it came closer to the correct answer even though it has a couple of errors in describing the algebra and badly fumbled rounding to significant figures. Chatbots are certainly capable of providing readable and almost invariably grammatically and syntactically correct responses although the reliability of those responses is highly variable depending on the specificity of the prompt and just how much ‘inference’ the chatbot has to do to produce a response. They can more or less be treated as Wikipedia articles written by Cliff Clavin after spending an afternoon at the dentist’s office coincidentally reading an article on the topic at hand, delivering some mixture of half-remembered fact and whatever filler seems likely to roll well off the tongue.
We are in “an entirely new era now”, and specifically once in which the confluence computational power and storage, access to enormous masses of digitized data, and the ability to manage large-scope neural networks via a transformer architecture creates emergent capabilities not available to researchers in the 'Sixties, or for that matter the 'Nineties and 'Aughts. The coincidental development of highly optimized vector processors for interactive graphics displays and large scale cluster management systems for parallel processing like CUDA also made for an ideal computing architecture to handle vectorizable data such as natural language text processing. But if anything, current AI evangelists are even more “overly optimistic” in their projections of what LLM-based systems will do in terms of developing artificial general intelligence (AGI), motivated no only by academic enthusiasm but the purported fiscal potential (and for some, eliminating of squishy human work units from the work ecology to be replaced by easily control machines not prone to illness, inattention, or demanding labor rights). This has led to the same kind of hype that Silicon Valley technology companies apply to every area of innovation, taking an impressive but fundamentally based innovation and blowing it up into the next revolutionary, world-changing development which they do from biotech to space technologies to virtual reality, and a critical eye to this behavior observes that there is always a cycle of increasingly more hyperbolic evangelism with any grounded criticism brushed aside as obsolete thinking followed by a massive deflation or in many cases total collapse of the nascent industry.
There is enormous amounts of capital and resources (including hydrocarbon-generated power and precious potable water) being put into training of these models and running the data centers necessary to make them usable for pseudo-‘agentic AI’ (more on this later) despite the fact that not only are they not making anything like a profit but lack a substantial use case that would justify this investment. Advocates often claim that these systems are ‘getting better’ as transformer architectures evolve and systems for doing retrieval augmented generation and post hoc filtering to control output are developed, and to the extent that these systems give more coherent responses that don’t completely go off the rails or immediately start ‘hallucinating’ this is true, albeit at the expense of running into limits of training data and ‘compute’; these systems still make basic errors in comprehension, contextualization, and producing erroneous results with fabricated citations or facts, which occur because these systems don’t actually have the fundamental capacity to create accurate contextual models of the real world independent of text (or in the case of hybrid models, still images and video) in their training set because that is literally the only information they have available, so even if these systems were capable of human-level conceptualization (which they aren’t but we’ll set that discussion aside for the moment) they would be fundamentally limited in what information they have access to about the world. There is no clear path to making these systems reliably in distinguishing established fact from speculation or error, and not even a really good way to prevent them from producing fabricated citations or presenting ad hoc explanations that aren’t based upon verified information without explicit post hoc filtering rules.
Just to be clear, I’ve never stated a “position of generative AI being useless”; I have said that it is unreliable, and that there is no clear path to marking it reliable enough to undertake ‘mission-critical’ responsibilities or produce work without human oversight and review. The responses that the chatbot produced in response to @Francis_Vaughan’s criticism were clear and at least mostly correct in their details about how the system works internally (and doubtless reflected in some subset of the training set albeit restated in a succinct form) but actually missed on responding to the more fundamental issues that are touched on in the post, and specifically the issue of scope for inference. The post makes an example of asking the LLM to perform a mathematical operation, and then using it to describe how to perform the operation, which an LLM will treat as distinct and unrelated queries in how it tokenizes the prompt and responds. I’ve actually seen that independently where a chatbot will give a (mostly) correct answer but a completely wrong explanation of how to perform the calculation, or conversely give a wrong answer but present a reasonably clear instruction on how it should be performed. While the response by the chatbot regarding “implicit statistical inference” and “simulating inference” are true of themselves, it misses the point that a person would understand the inherent relationship between those two questions but an LLM does not because they are essentially serial queries.
I think it is important to understand the difference between how an LLM performs inference to how person does it. LLMs are, fundamentally, neural networks which have been trained on enormous amounts of textual data (far more than a human would ever be able to consume) and uses that to generate to apply parameter weights to an elaborate neural network consisting of many layers of nodes and backpropagation (BP) to refine those values to produce a statistically ‘likely’ string of text; the weights and their processing functions create implicit algorithms to produce the response. Once trained, a prompt is introduced to the system, tokenized into vectors, and using computational brute force of the transformer architecture churns out a ‘cromulent’ bit of text consistent with what is statistically represented in text and what it has produced before in the current context frame. A human brain, on the other hand, builds and constantly updates and refines conceptual models of the world which allow it to be trained upon a tiny fraction of total human knowledge and yet process nearly any text (in the language it is familiar) using its 25 watt brain power and with the capacity to reference previous discussions or seemingly unrelated concepts that are linked in a contextual frame. Humans don’t use backpropagation for learning and don’t really have anything like a transformer architecture; to the extent that we understand attentional systems in human cognition, their flow is quite non-linear and non-sequential.
It seems amazing to most people, and even people working the in the field of generate AI, that a system designed to process and generate natural language text can ‘reason’, but in fact there is an implicit logic in the structure of how language is used, and if you are mapping a sufficiently large corpus of examples of textual language into a BP-modified weighted artificial neural network (ANN) you are going to end up with something that can use language to ‘deduce’ correct answers to at least common questions and ‘simulate inference’. (That these systems can also perform mathematical calculations is also unsurprising because algebra is essentially a grammar for doing various mathematical operations and it is actually pretty straightforward, so deriving them implicitly by examples an expected ‘emergent property’ of an artificial neural network.) This is an impressive but not a surprising result, and I find it a little shocking that people working on LLMs are not more conversant with neurolinguistics to understand this. However, if they were fed a corpus of grammatically consistent nonsense pseudo-text a la Lewis Carrol’s poem Jabberwocky, they would also produce a set of rules for manipulation and ‘inference’, even though there is no real world meaning or context for the things and actions describe within. It would confidently provide a definition for “galumphing” or “borogoves” consistent with use in the training text without any comprehension that it is total nonsense.
I’ve been using ANN-based tools for machine learning for about the last fifteen years, and the kind of patterns they can tease out of large datasets or the emergent capabilities they can create in terms of filtering algorithms are very impressive. To be clear, I think LLMs are amazing that they can accurately manipulate a wide variety of natural language text and produce a pertinent response, something which symbolic attempts at machine cognition consistently failed at above some modest ceiling of complexity for decades. As a uniquely human capability (as far as we know, although I reserve judgement about cetaceans), we tend to view language as fundamentally reflective of intelligence rather than a tool of it, and so when something talks it gives the impression of being smart, self-aware, and capable of independent thought even when that is shown to not be the case. I think people—including many researchers—are impressed with these systems because they are a mirror reflecting what they project upon them and the tone of language in their training corpus rather than critically evaluating if and how they could actually be ‘thinking’ or developing volition.
This does present a pretty revolutionary jump in the capability of this kind of ‘AI’, and there are real use cases for it, albeit probably not the kind of trillion-dollar industries that would justify all of the investment into AI research and training, and certainly not matching the breathless exuberance that evangelists have been proclaiming that it will be capable of “next year” for about the last five years. I don’t believe for a minute that these systems have a ‘spark of consciousness’ within them, am highly doubtful to a point of near-certainty that this isn’t a path to AGI or even true machine cognition without some kind of radical change in transformer architecture that actually allows the LLM to be a knowingly quasi-self-aware, introspective, constantly self-updating system. I don’t believe they are going to start intentionally distracting us and build giant robot factories that will displace human workers en masse (although I’m sure many employers and CEOs are hoping and betting that will come to pass so they can eliminate the human element of ‘Human Resources’), and the real danger isn’t that they’ll intentionally take over but that we will assign responsibility to systems that aren’t actually capable of any kind of reliable control or decision making.
Stranger