As an interesting exercise in a sort of meta scenario, I asked GPT for its opinion on just the initial portion of your post. I find it fascinating that we can do this in today’s world – asking an AI for an introspection of itself and its opinion on a critique of itself. I’m interested in your reaction to this.
Its opinion was that the part of your post I submitted was “generally thoughtful and mostly accurate assessment of how large language models (LLMs) like GPT-4 operate, but there are a few areas where the evaluation could use refinement or correction”.
These were the areas:
 Partially Misleading or Oversimplified Claims
 Partially Misleading or Oversimplified Claims
- “There is little scope for inference.”
- Oversimplified and arguably incorrect.
 While LLMs don’t perform inference in the symbolic, rule-based sense (as in a logic engine or theorem prover), they do perform forms of implicit statistical inference. For example, they can deduce relationships, perform analogical reasoning, or arrive at conclusions that mimic human inference. The ability of LLMs to answer multi-hop reasoning questions or to perform chain-of-thought reasoning suggests they can simulate inference quite effectively, even if the mechanism is not explicit or interpretable in classical terms.
- “The only state is the prefix phrases passed in on each iteration.”
- Misleading.
 While it’s true that transformer models are stateless in the sense that they don’t maintain persistent memory across calls (aside from fine-tuning or techniques like RAG), within a single forward pass, there’s an internal representation of all previous tokens—the entire context window is available. And within the computation graph of one inference, layers maintain evolving “hidden states” and attention maps across tokens. So while external memory is absent, ephemeral internal state is rich and crucial.
- “The internals can pass a kind of working state down the pipe in a very limited manner.”
- Partly true but underappreciates transformer dynamics.
 Transformers pass information across layers via attention and non-linear transformations. These aren’t “limited” in the way traditional recurrent models are; in fact, one of the advantages of the transformer is its ability to model long-range dependencies very effectively. The “limited manner” phrase underestimates this capability.
- “The ability of the LLM to pass information widely between the huge number of running pipelines is limited.”
- Ambiguous phrasing.
 If by “running pipelines” the author means the parallel processing units in the transformer, then yes, architectural trade-offs exist (e.g., fixed context windows). But within a single forward pass, transformers do pass information widely across the entire input using self-attention. The limitation arises more from context window size and memory constraints than the architecture’s inability to “pass information.”
 Summary of Analysis
 Summary of Analysis
The author has a decent high-level grasp of LLM architecture and its limits, but they underestimate the model’s ability to simulate inference and oversimplify how internal state and information flow work in transformers. Their language seems shaped by a classical computer science or symbolic AI lens, which may lead to undervaluing the emergent capabilities seen in practice.