Suppose that your proof, as far as it goes, is unassailable. For purposes of this discussion, let’s grant that this is true. Where does that take us, specifically with regard to this conclusion:
… as long as ChatGPT only knows words and their relations, it will never be able to infer from there anything about things and their relations: it produces sentences without any connection to states of affairs—it understands nothing.
I want to explore what this says about what ChatGPT and its successors will or will not be able to do in the future.
What I think is perhaps wrong with your conclusion is an unnecessarily constrained view of the concept of “understanding”. We might gain better insight if we substituted, instead, the more pragmatic term “semantic competence”.
One could argue that ChatGPT possesses no “understanding” as you defined it, but nevertheless has a high level of semantic competence. This enables it, first of all, to discern nuanced context-dependent meanings from natural language, and then to act on those meanings by solving difficult problems, accurately translating language, and performing many other remarkable tasks – tasks that were once thought to be in the distant future of AI capabilities or indeed not possible at all.
There’s an interesting paper that touches on this that’s worth a look. I’ll summarize it but first a couple of definitions of terms:
Contextualism: the view that context-sensitivity generalizes …There is context-sensitivity whenever a distinction has to be drawn between an expression’s lexical meaning (invariant across occurrences) and its (contextually variable) semantic contribution.
Computational semantic minimalism (CSM): that the semantic content of a phrase P is the content that all implementations of P share.
The article argues that while CSM is a familiar characteristic of classical GOFAI (“Good Old Fashioned AI”) it no longer applies to the deep learning paradigm of deep neural nets. There, the author argues, a new characteristic emerges that he calls “radical contextualism” which has powerful implications, not just in supporting highly nuanced semantic competence, but fundamentally in its logical processes:
In the case of [artificial neural networks], complex contexts breed[ing] metalinguistic vagueness as parts of the machine’s workings are obscured from interpretation. In other words, metalinguistic vagueness is vagueness about what counts as literal and non-literal. ANNs have no need for fixed meanings from which others are modulated …
Thus, there is an indeterminacy about which connections or weights are generated from others. This process seems to be completely pragmatically determined by the machine, allowing for the possibility of decisions on significance being nonstandard (nonlinear) and even ad hoc. For example, in predicting who the next likely president of the United States could be (given a range of candidates in the test set), the machine might focus on characteristics we do not usually consider relevant or salient such as age, gender, race, birthplace, education and so on. It sees thousands of data connections we are unable to appreciate. So what is meaningful in the decision or output might have no obvious parallel in our limited reasoning, like having a particular ancestral background.23 One of the main differences between GOFAI and deep learning is that the latter can automatically design high-dimensional features to probe the data.
In short, we can grant a lack of “understanding” in the sense meant in the OP, and still show – as in this excellent overview paper [PDF] that I posted in the other thread – that semantic competence can take us a very long way toward genuine human-like behaviours and maybe even AGI – artificial general intelligence.