AlphaFold is a legitimate demonstration of a valuable application of technologies under the broad banner of ‘artificial intelligence’ but it should be noted that it is a purpose-specific application of “deep (machine) learning” technology with transformer architecture. It is not a large language model (LLM) or broad purpose generative AI; nobody is trying to assert that it has a ‘spark of consciousness’ or somehow on the cusp of general artificial intelligence, and it is possible to develop a validation protocol for models against an established baseline. This is exactly the kind of application that artificial neural network (ANN) should be used for because it can integrate through many different forms of possible molecular configurations until develops implicit algorithms for finding workable patterns that produce biochemically useful proteins. In that sense, it encodes useful knowledge in the specific domain in which it functions.
The problem with LLM ‘chatbots’ is that most people think of an use them as general knowledge systems when they are trained and function as a language manipulation system. They work as well as they do because there is logic and contextual information that is built into the use of language, and if there is enough text in the training set it can provide a simulicrum of a knowledge base, but it isn’t actually a verifiable base of factual information. Worse yet, because the primary function of an LLM is to provide a syntactically correct textual response to a prompt, it won’t just respond, “I don’t have this information in my trading set,” and in fact it has no comprehension of whether the correct information is in the training data or not; it synthesizes a response that may or may not be semantically correct depending on whether the appropriate answer is represented in its training data with some degree of frequency. For trivial questions it will probably give a correct answer (although apparently GPT-5 is apparently ‘hallucinating’ even with some simple things) and for more complex questions it will often give a response that is superficially correct but doesn’t demonstrate any comprehension of deeper context, but unless it hits some hard-coded limit it will never just say, “I don’t know anything about that topic.” It will literally just make something up, famously including fabricated cites to papers and caselaw that do not exist.
The reason this is a issue is technically a ‘people problem’ with users not understanding that they need to verify all factual information and carefully consider any reasoning presented in a response, but the problem with that is that the ‘killer app’ usage of these systems is to do all of that grunt work for you in the name of efficiency. So workers using these tools are under pressure to get as much productivity out of their use, which inherently means (mis)placing faith that they can be trusted to provide correct information and analysis. Enthusiasts are predisposed to see them as being ‘mostly right with an occasional hiccup’ but in fact the errors are fairly frequent and very significant if they are being used for any critical application, and the reality that there is no kind of baseline to validate these systems against means that all of the evaluation about how good they are is essentially anecdotal. And if we can’t actually have this kind of confidence in them, then they are essentially a novelty, at least for the applications to which they are put. A reliable natural language processing system could be paired with an actual knowledge base system but either it still needs to have specific constraints to prevent ‘hallucination’ for the interpretation of how to query and use the knowledge system, or else you may still be left with the fundamental problem of reliability. In any case, it isn’t a capability that justifies the tens of billions of dollars of investment just to create a refrigerator who can casually shot the shit with you while you drink a beer.
Stranger