The role of electronic brains

That seems like a very strange point to make since a large part of what ChatGPT does is search its repository of training data to guide its responses, and the fact that Microsoft has deployed that exact same LLM technology in the “New Bing” precisely to function as an intelligent search engine. In the same vein, a group of researchers evaluating the performance of GPT-4 on the suite of tests in the US Medical Licensing Exam commented favourably on the potential for its intelligent search capabilities to aid in medical education and clinical decision-making.

Except that it very clearly does solve problems!

  • It solves logic problems, including problems explicitly designed to test intelligence, as discussed in the long thread in CS.

  • GPT-4 scored in the 90th percentile on the Uniform Bar Exam

  • It aced all sections of the SAT, which among other things tests for reading comprehension and math and logic skills, and it scored far higher across the board than the average human.

  • It did acceptably well on the GRE (Graduate Record Examinations), particularly the verbal and quantitative sections.

  • It got almost a perfect score on the USA Biology Olympiad Semifinal Exam, a prestigious national science competition.

  • It easily passed the Advanced Placement (AP) examinations.

  • It passed the Wharton MBA exam on operations management, which requires the student to make operational decisions from an analysis of business case studies.

  • On the US Medical Licensing exam, which medical school graduates take prior to starting their residency, GPT-4’s performance was described as “at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations.”

It’s just absurd to say that “it is not for solving problems”.

I’m repeating myself because I’ve said this several times before, but this description based on a simplistic understanding of how it works is both true and extremely misleading. The misleading part is that it implicitly relies on our intuition to conclude that this is all it can do, which is false because entirely new properties and behaviours emerge – often unexpectedly – as the scale of LLM grows. GPT-4 is believed to have over one trillion parameters; each parameter represents in essence a learned piece of knowledge that drives its behaviour and, in particular, allows it to make inferences and create entirely new content.

This is just false, as stated above, and as shown by all the examples given of ChatGPT’s actual performance in creating content and solving problems.