Predictions a decade or so ago were that competent AI could take centuries due to how insanely complex cognition is.
But we’ve gone from GPT-2 in 2019 to -o3 in 2024 which to my untrained, uneducated eye, seems to be comparable to a human with a doctorate in virtually every field (based on how it does on medical tests and the GPQA. GPT-2 was supposedly more like a toddler.
Is it possible we’ve overestimated how hard human level intelligence is to solve? Like we were being chauvinistic and assuming cognition on our level is some insurmountably complex problem (because of how special and amazing homo sapiens are), but really isn’t?
Also if AI is trained on human data, how do we get Ai that transcends human capabilities? Will AI max out at about peak human level for a while? How do you get an AI with cognitive abilities equal to an IQ of 300+ when humans and the data we produce max out at about ~200?
Or it may be that many humans do not exhibit human-level intelligence; at least, that is what I would think if you went around talking like a chat bot (since you mention GPT bots). I have seen replies to questions, written by presumably real humans, which are probably worse than a chatbot answer.
I dare you to beat the computer at chess. Or poker. Or at correlating obscure facts…
I think it’s partly because we never thought we would have enough processing power and it turned out to be a problem that was solvable given enough money thrown at it.
There was a robotics and AI researcher (sadly I forget his name) who said that the human brain was incredibly complex and that AI should start with something simple. He built bug robots. IIRC One was named Ghengis and another was Atilla. This was in the early nineties. His bug robots behaved the way real insects would. There is a character in the X Files episode War Of The Coprophages (the roach episode) very loosely based on him.
More likely that AI shows that work in many professions consists mostly of recognizing patterns and finding the proper response to them. We applied the word “intelligence” to those whose experience, training, and pattern recognition found responses that succeeded to various degrees.
Is that all there is to “intelligence,” though? Has AI been shown to be applicable to all disciplines? Does AI’s multiple mistakes, hallucinations, and hackery reveal that understanding is at base different from regurgitating? Can AI learn more in the future if it lacks good access to training materials?
And are we overusing the term “AI” to begin with? “AI” is a term that is being thrown at everything, the way “computers” was an all-purpose term. We could solve every problem with computers, they used to say. Computers were unbelievably useful and their applications incredibly widespread, but we seem to be back at the beginning again, with another set of problems that computers/AI need to solve. Moreover, we got here because of the work of humans, creating something new and different from anything that came before.
Is that the definition of “intelligence”?
Basically, I’m just saying it’s way too soon to tell. We’re back in the 50s, maybe early 60s. Who knows when or if we’ll make it to the 20s again.
This assumption is completely wrong. Chatbots—even the vaunted GPT-4o—make so many basic conceptual errors in response to straightforward prompts with such frequency it is questionable that this approach will ever produce output that is facutally reliable enough to use for any kind of safety- or mission-critical applications. They ability to process language sufficient to pass a standardized test is unsurprising when you consider how much logic is fundamenrally built into the stochastic structure of language, and that sample questions and answers for such test are almost certainly a part of the the training sets that OpenAI and other chatbot makers are so coy about sharing details of.
Processing power is certainly a key component in being able to consolidate the information from massive training data sets into a vast neural network of response functions. But your 25 watt brain has a tiny fractuon of that ‘compute’ and operates at an effective clock speed many orders of magnuitude slower than a modern digital computer, and yet manages to not only process language but construct deep conceptual models of the physical world tied to verbal and written language (and non-linguistic communication) within just a few years of very limited data and direct interactions with only a very tiny part of the physical world. What takes a supercomputer billions of kilowatt-hours (kWh) of power to produce a chatbot that can basically be done by a high school graduate with the equivalent of 40,000 kWh, despite the fact that the human is spending much of their time worrying about how they smell, what they are going to do after graduation, and how to get on with that other human they have a particular amorous affinity for.
The problem of how to get a software-on-silicon-substrate problem of processing natural language has been largely ‘solved’, at least insofar as producing a grammatically correct response. The problem of comprehending semantics or creating sapience, on the other hand, has scarcely if at all been touched and is unlikely to be addressed with this brute force Bayesian approach because ‘common sense’ about the world requires a vastly more complex model than just processing language and producing the grammatically correct semantic gibberish that is frequently the product when you prompt a chatbot to provide anything more than simple reiteration of fact, and indeed they even fail at that with astonishing frequency despite the outlandish resources used in training them.
A human does not need access to a trillion bytes of data to draw conclusions. Think of how little (and often poor quality) a human needs to learn their native language. I think the human superpower is the ability to jump to a conclusion, often correct) based on a woefully tiny data set.
Early this year, someone told me about an AI program that was said to be able to find general formulas relating special cases. So I sent them a challenge question involving something called the shuffle idempotent. For the record, I will say that it is actually a series of idempotents, one for each n in the rational group algebra of the symmetric group S_n. I had formulas for n=1,2,3 (obvious for 2, hard for 3, insanely hard for 4), and had suddenly discovered a general form. They said they would try. Last I heard.
I think that’s not entirely clear. The human retina has a bandwidth of about 9 megabits per second. Thus, one eye takes in something like 20 terabytes of data per year. Most of that information is immediately processed and discarded before it even reaches the brain, but I think it’s not a very simple question to say how much data a human has accessed in order to reach an adult level of intelligence.
Yeah, but it is not obvious that all those gigabytes the eye takes in has much to do with human intelligence. To what extent does it help with language learning, for example?
Think about how little evidence the average English speaking child has access to before they decide that the past tense of go isn’t goed?
It certainly doesn’t prevent children with congenital blindness from acquiring language or developing intelligence. Comparing visual inout in terms of “gigabytes” is pretty meaningless give that the brain isn’t a digital computer and doesn’t process or store visual or other information in discrete bits.
Are there any cites for any of these test scores that don’t come from the AI researchers themselves? For the tests for which I have some knowledge, I have strong reason to doubt that the tests were actually fair and rigorous.
The answer would depend on whether or not the AI is genuinely creative or not. So far, most complaints about AI is that it just blenderizes existing concepts and regurgitates new mixtures. They never really create anything truly new and unique.
If you can get an AI to create actual new ideas, art, and inventions, then they can start to exceed our own capacities, which would bootstrap them up to the super-human levels of intelligence all our science fiction wants to have.
I’ve found that many of the strongest proponents of AI seem to have a bone-deep contempt toward human intelligence and humanity in general. Just something I’ve noticed.
You can show a child an apple and they can associate it with the word. And even a blind child can touch and taste the apple. An LLM has to infer the meaning solely from its relationship to other unknown words.
That is surely not the only thing going on with training efficiency, but it’s probably a major one. Another is the lack of feedback—no adult telling them that they’ve gotten things wrong. Perhaps future AIs can be taught by others.
Humans are also not blank slates. We benefit from hundreds of millions of years of evolution in our neural architecture. Even seemingly uniquely human features like language are probably small tweaks on existing mental features.
Yes, the latest OpenAI o3 model just scored 25% on the FrontierMath benchmark:
These are very difficult questions that they went through considerable effort to collect. OpenAI could not have trained on them.
Also, this was a new model that spent enormous compute on for each question—thousands of dollars worth. Even if they somehow had trained on leaked data, it would not have required so much compute—it could have just regurgitated the data. But that’s not how it behaved.
They’ve always been very good at arithmetic. That has basically nothing to do with being good at math, though. There have, historically, been some computer programs that are competent at some kinds of math, but it’s not something to be expected out of the box.
Not only does that also appear to be from the AI researchers themselves, but it also says that the highest score any AI has gotten is 2%. It also describes all of the problems as being unpublished, and then publishes them.
In many cases, the actual test results were blind-graded by professors at the associated schools, e.g.-
The powerful new AI chatbot tool recently passed law exams in four courses at the University of Minnesota and another exam at University of Pennsylvania’s Wharton School of Business, according to professors at the schools. ChatGPT passes exams from law and business schools | CNN Business
Of course that doesn’t mean the entire process was transparent – there appear to be controversies over precisely what the AI had been trained on. I think academic AI researchers are mostly ethical scientists at heart, but the ethics at companies like Google make me distrust those in for-profit entrepreneurial business like OpenAI. Sam Altman is more of an entrepreneurial engineer than a research scientist.
I don’t know what would lead you to such a bizarre conclusion or what causative factors could possibly make it true. I think you’re probably just misinterpreting the common belief in the AI community that human intelligence isn’t a unique property of being human, and that it can be replicated and even exceeded by digital computers. This may upset some humanists but I believe it to be factually true.
The AI community also tends to be dissmissive of the alleged risks of AI, not because they don’t care about humanity, but because they think the risks have been greatly overblown. Our entire economy is already irrevocably dependent on computers; this dependency is nothing new, and there’s no going back from it.
While this is true in principle, it’s all too easy to greatly underestimate the richness and power of that relationship (and LLMs are only one approach to AI; there are others for which it isn’t the case). An LLM can describe the various properties of an apple probably better than any single human could, because the detailed knowledge spans multiple academic disciplines. And now with the addition of visual processing, you can show an AI an apple and it can surely tell you what it is.
In fact, on a radio call-in show in which people described how they had used AI, one woman called in to say that she had some gadget she didn’t know how to use (I think it was a new smartphone or something like that) and the AI not only visually recognized exactly what it was, but gave such detailed instructions for using it that she happily copied them over, saying they were better than the user manual.
Again, one can have silly philosophical arguments about whether or not it’s possible for an AI – now or ever – to have a “true” understanding of real-world concepts the way that humans do, but these academic arguments pale in comparison to the actual accomplishments of this supposedly limited AI. I’m unimpressed by arguments that LLMs (or any AI) acquires its model of the world in fundamentally different ways than humans do. Modern airplanes didn’t evolve from birds and they don’t fly like birds, but they do fly – higher, faster, and farther than any bird.
There’s plenty of feedback. That’s basically what supervised machine learning is, and that forms at least a part of most LLMs training protocols.
I don’t know about contempt but many are certainly in the ‘reductionist’ camp of neuroscience, with some arguing that human beings are not ‘conscious’ (for some arbitrarily exclusive definition of ‘consciousness’) and do not have anything approximating ‘free will’ (the less said about those arguments the better). To an extent, this comes with the territory because in order to simulate human cognitive processes on an untractible substrate using digital computations, cognitive processes necessarily have to be reduced to simple rules that can be used to define linearized models. That the animal brain (and indeed, all biological structures we would identify as leaning systems) are quite malleable by nature and that individual neurons are not just transistors or logic junctions is lost upon them even when it is explicitly describe in the literature of neurobiophysics.
This is not to say that a machine cognition system could not be built and function in software-on-silicon, and for narrow definitions of ‘cognition’ this has already demonstrably been done, but it almost certainly wouldn’t have any ‘spark of consciousness’ or compared directly to the cognitive processes of complex animals. If something akin to a truly creative—rather than the synthesistic—machine cognition system were to become functional and enjoy sentience and sapience, it would likely think and act very different from human beings.
LLMs, of course, are designed to ‘act’ in ways that resemble human-generated text, giving the impression of human-like consciousness through the statistical manipulation language (and the logics that are built into comprehensible patterns of words and grammar) even though their responses often demonstrate a lack of semantic depth on any topic where they cannot just directly regurgitate statements they have been trained on. This os why they are pretty decent (if factually unreliable) at summarizing basic technical or historical information, but why prompting the. to generate ‘creative’ work produces a pablum of tired tropes and often nonsensical actions and descriptions.
Although the ‘language acquisition’ phase of training a large language model (LLM) is largely unsupervised learning (of pre-selected datasets) the fine-tuning to make an LLM functional in any utilitarian way is definitely supervised (i.e. receives feedback from human arbiters) to a degree that most companies working on LLMs are reluctant to admit. The same is true for other kinds of generative models, using humans to categorize images or provide other feedback, often without their express consent or knowledge.
That site (FrontierMath) is an effort to produce credible benchmarks for assessing the ability of artificial intelligence systems to solve or prove complex mathematical problems (not just do arithmetic). The company, Epoch.AI, does not create its own LLMs or generative AI systems, and its mission is to be critical about claims made by AI developers, specifically in regard to mathematical abilities. The “25%” is just their estimation of the difficulty of those problems in their test set; that is, that they are in the 25% of most difficult (presumably solved) problems in contemporary mathematics, and are sufficiently abstruse as to not have solutions directly described in the literature that an AI system would be trained upon. They state that “Each problem demands hours of work from expert mathematicians. Even the most advanced AI systems today, including GPT-4 and Gemini, solve less than 2% of them.”
I will note that the one thing that LLMs and other generative AI systems are good at is recognizing patterns, which they can do by dint of the massive ‘compute’ and the use of Bayesian statistical methods, and they can do so on data sets far too large for a human to integrate and hold in their brain, so there are certainly some aspects of problem-solving in mathematics and associated fields where we would except a sufficiently advanced AI to be superior to even the smartest Fields medalist. And there is, again, a logic build into comprehensible language (as anyone trained in computational linguistics can confirm), and especially in the narrow occupational jargon of mathematics where each word has a specific and typically unique meaning. So, a sufficiently powerful AI should be expected to be able to solve many problems just by the brute force application of the ‘rules’ that it has compiled by being trained on mathematical proofs and solutions. But that doesn’t mean that they comprehend to purpose or application that the proof or solution to a problem has in the real world, because their entire conception of ‘the real world’ is already abstracted through the texts, sound, and images they have been trained on. Nor do these systems have ongoing internal ‘self-correcting’ (or ‘self-creating’) processes that are a default function of animal brains and produce the internal dialogue and constant integration which we interpret—rightly or otherwise—as sentience, sapience, and (at least intermittently) the qualia of consciousness.
And here we have manifested @Alessan’s observation about the contempt that AI enthusiasts have; when you can’t provide a counterfactual to the statement, attack the writer’s competence, credibility, or integrity.