How useful is a system that often provides incorrect factual information?
From your cite:
The researchers emphasise that these findings represent performance in controlled testing environments rather than real-world clinical practice. “It is important to recognize that these evaluations occur within idealized, standardized testing environments that do not fully encompass the complexity, uncertainty, and ethical considerations inherent in real-world medical practice,” they cautioned in their discussion.
In short, it is good at taking a standardized test which includes some multi-modal elements. That is an impressive feat of replication but doesn’t mean that it has any actual understanding of the complex interactions of a human patient in the physical environment.
Sure, your interactions with ChatGPT gave you the nomenclature and jargon to sound like an informed patient, and even some cursory information about appropriate diagnostics (information you probably could have gotten by reading the same online sources that ChatGPT was doubtlessly trained upon) but that doesn’t make it an expert or reliable system for medical diagnostics. It is repeating the use of language found int he structure of sources of training data, and provided those are credible sources it is providing cromulent-sounding guidance. But it has no actual knowledge of medical diagnostics or pathology of disease as applied in a clinical setting; it just has the text and image date from which to synthesize a statistically appropriate response. Given a sufficiently large base of training data, enough parameters to generate a complicated response, and a “Chain-of-Thought” recursive model to enable it to break the parsing of the prompt into manageable segments such that it doesn’t immediately spiral off topic and ‘hallucinate’ a completely inappropriate answer, it can produce a plausible-seeming response that reads like what the first pass of an attending physician might write in their notes. That doesn’t mean that it is actually making a good diagnosis, or that it would recognize an obvious anomaly or error, or that it could formulate an appropriate treatment plan.
Stranger