If by “pure scaling” you mean “pre-training scaling”, then sure, pure training is close to dead (although we’ll see what happens when xAI puts an order of magnitude more GPUs on the problem). That seems like a bit of a strawman to me, though. People thought pre-training scaling would have a lot of legs, and it did (several orders of magnitude). But I don’t recall anyone claiming that it would scale forever. When Altman says “resources used to train and run it”, is that contradicting something he said earlier?
and consume one gigawatt of energy per year—or as much as large American city.
Please kill me. Engineers are forced to take classes in writing; journalists should be forced to take electrical engineering.
and that is just how much it would cost to rent the cloud computing time to train the model, a figure that is largely determined by the energy cost of running so many GPUs round-the-clock for weeks and months
Untrue. An H100 uses about 700 watts; let’s say 1 kW to include the other stuff. That’s 8700 kWh per year, or <$1000/yr at industrial rates. But the GPU itself costs ~$40k. The energy cost is basically trivial.
It has been the slogan of AI for years, now. Ever since the (in-)famous ‘Game Over’ tweet of DeepMind’s Nando de Freitas, we’ve been promised AGI would arrive by merely scaling the existing approaches again and again and again (random google results just to demonstrate the ubiquity of that exact slogan). It was the refrain of Aschenbrenner’s much-publicized manifesto last year. There’s even a t-shirt (made famous by Jared Kaplan, co-founder of Anthropic). (Which incidentally makes clear what is meant by scaling: increasing compute, training data, and parameters, which is what is typically considered to go into the much-vaunted ‘scaling laws’).
I was using AI translation today for teaching, and it’s both really a time saver and frustrating at the same time. If you don’t care that it’s inconsistent and just correct and go on, then it really does help.
One of my students consistently makes jokes about eating toilet paper so I incorporated it as a sentence. ChatGPT specifically commented on that as a “unusual” sentence.
Interesting! Can you explain how you use it and how it saves time? I’m curious!
Ok, I’ll acknowledge that some of those people (Kaplan in particular) got a little too excited about scaling laws and somehow forgot that every exponential is actually a sigmoid. But even within your own examples, Nando was saying that we need memory augmentation and more modalities, and Rimschnick was talking about “subsequent reasoning capabilities” and only that building off of LLMs was an obvious choice.
So it seems to me that a good chunk of people, though not all, were interpreting “scale is all you need” as inclusive of stuff beyond pure data and parameter count. Basically, as a synonym for the “bitter lesson”, which does not advocate for any particular ML model, but just that any attempt to be too clever or build in what seems like obvious choices will be outcompeted by general approaches that depend on scaling the hardware. Chess with human-inspired heuristics got outcompeted by self-play, computer vision that tried to do explicit feature detection got outcompeted by learning models, and so on.
There is also the economic factor. Even pre-training scaling is not exactly dead. It just seems to be economically dead. It may well be that a quintillion-parameter model could achieve AGI. Who knows? No one can afford that now, but perhaps in a few decades it’ll be achievable. But in the meantime, we have AI companies trying to sell their product now, and it looks like they can get faster results by scaling things other than parameter count. If computation is relatively cheaper than storage, then scaling inference becomes more efficient than scaling parameters. And there are many similar kinds of tradeoffs available.
I think the cooldown of enthusiasm around LLMs is now becoming very palpable. Most recent grand reveals have been met with disappointment and puzzlement, with a recent post of Dean Valentine going viral on LessWrong under the headline ‘Recent AI model progress feels mostly like bullshit’. Everybody seems to be looking for the next big thing (World models? Neurosymbolic approaches (finally)?), with Ilya Sutskever announcing that ‘[t]he 2010s were the age of scaling, now we’re back in the age of wonder and discovery once again’. A recent Salesforce survey reports that 61% of companies are experiencing no or poor ROI from their AI investments, with only 21% citing ‘moderate’ or ‘significant’ returns. AI datacenter construction plans are being put on hold or cancelled (although recent economic volatility may play a role in this development). Even in the coding department, for many now a shoo-in for near-future AI deployment, models still struggle with elementary debugging tasks. Finally, even methods to overcome LLM limitations, such as the reinforcement learning (through verifiable rewards) that has been widely alleged to yield ‘self-improving’ models, seem to be hitting a ceiling, with a recent study showing that such models may become more efficient, but don’t acquire any fundamental new abilities—i.e. the RLVR-augmented model hits on the correct solution more quickly, but the base model can find it as well, simply needing more trials to do so.
Of course, GenAI/LLMs are here to stay, and are likely to develop into a valuable tool for their use cases. But right now, I’d say that there does not seem to be a path for them to becoming a ‘universal’ model, applicable to any and all tasks; much less to yield AGI. What we’ve seen seems to be a kind of 80/20 illusion: the (relative) ease with which we have come 80% of the way fosters an expectation that the remaining 20% can be scaled quickly, but as the (obviously imperfect) rule has it, we may be looking at 80% of the work still needing to be done.
Sorta funny how a ~3 month marginal cooldown in progress is seen as a huge deal. Even in the heady dot-com days, things weren’t moving that fast.
I don’t necessarily disagree with the analysis, but I also think a large part of the ROI problem is that companies just don’t know how to make use of the tools. There’s only so much value to having your emails summarized.
What I’ve found personally is that the greatest values comes from automating tasks that would be difficult to justify otherwise. I’ve had AI write tools in minutes that would take me days or weeks otherwise. The tools aren’t good enough to distribute generally–they aren’t polished or debugged enough. They may only work for the specific task at hand. But because they cost almost nothing to make, I can easily justify them.
I’m not sure that many people have really internalized this yet. People hardly think about tools at all, or means of making themselves more efficient. They’re kind of annoying to make and do not, by themselves, accomplish anything useful. They just make things more efficient. But what if you could build tools almost for free?
Just today I had a giant text report that another automated tool had spat out. It was getting pretty unwieldy, especially since I wanted to send the information to others that didn’t really know what part to look for. I just had the LLM write a script which converted the text blob into a nice web page with searching, expanding/contracting section headers, hyperlinking to jump between related sections, and so on. It’s just a throwaway script that no one else will ever use, but it did exactly what I wanted almost immediately.
LLMs are still bad at debugging, I’ll grant. That’s actually what they’re worst at, just like junior programmers. So I have to jump in myself from time to time. It’ll be interesting to see what progress is made there.
Seems to me you’ll end up building lots and lots of them, each of which narrowly scoped to a specific task, poorly understood, and largely untested, yet needing maintenance and integration with changing system environments. I think you’re probably capable of handling the resulting complexity, but imagine something like this deployed in a large corporation’s software ecosystem, with various overworked admins inheriting idiosyncratic tool chests from their predecessors, unsure about what all of it does, how it interlinks, and where even to look if something breaks—to me, that seems more like a recipe for generating huge amounts of software entropy than a value-generating proposition.
Everything except the last one: the code just gets thrown away (unless it proves exceptionally useful, in which case you assign an actual team to it).
It’s like a jig in wood or metal working. If you’re building 1000 of something, then you’d be an idiot not to build some kind of jig to make the cutting or assembly easier. But what if it’s only 5 things? Well… there’s a tradeoff between the time it takes to make the jig and how much time you save.
Unless the jig is essentially free, which it is in this case. So even if you’re just making one thing, it almost certainly saves time. And you just throw it away when you’re done.
I agree that it could generate entropy in the wrong hands–say, a barely competent manager that puts together a half-working prototype of something, hands it off to his team, and then expects them to improve it into an actual product. That kind of thing isn’t likely to work.
I can only say that that hasn’t been my experience in the workplace. Whatever works gets amended, adapted, extended, acquires greebles and blinkenlights, is revamped, added to a larger package, repurposed and twisted. Code where nobody knows what it’s for is left intact for fear that it might break something critical. And so on. But who knows, maybe AI will manage all of this?
In a way that’s actually what I’m already doing with my tools. I have some systems which are fairly opaque, to say the least. Someone, somewhere put them together, but it’s difficult to track things down. So in part I’m using LLMs to both write the analysis code to decompose these systems and figure out who is responsible for what, and also do the analysis on it directly.
Like: say you’re presented with a filesystem with a bunch of junk on it. What’s the important stuff on there? There’s some version of Python, some libraries, some large data files, etc. I can go through it by hand easily enough, but that takes time. So I just ask the LLM to summarize the directory listing. Works great, and now I can ask people specific questions like “Who installed X and what drove the requirement?”
Sorta funny how fast we’re going from ‘AI will revolutionize work’ to ‘it’s kinda neat for sorting files’.
Not sorting. Summarizing. Not something that I can easily even describe how to do programmatically, and yet in natural language it’s easy.
The fact that we have systems that can understand natural language at all continues to astound me. Is AI overhyped? Yes. It’s also underhyped. I don’t think anyone remotely anticipated that it would work out this way.
I continue to be disappointed that linguists have not basically dropped everything to really understand LLMs. We actually have a non-human example of language acquisition. It’s like an alien landed on Earth and started speaking to us. Whatever language is–how it works, how grammar arises, how it’s learned, and so on–is clearly not remotely what anyone thought it was.
I definitely agree, here. However, regarding linguistics, I’m not really familiar enough with the field to comment, although googling unearths at least some examples of research papers and conference contributions in computational linguistics on the topic of LLMs. But more importantly, why would linguists want to ‘drop everything’? Yes, it’s intriguing to have another system of language acquisition at hand; but perhaps the most striking thing is, compared to the legacy system (i.e. the human brain), how strikingly inefficient and wasteful this system works. According to the back of the envelope estimate I made above, an LLM ingests something like 80,000 years worth of textual data (as compared to a person reading) in its training to achieve a performance that, let’s face it, in the end is mostly fine. A human brain needs, what, on the order of months? And the power of a dim light bulb to assimilate it all.
One might argue that the challenge of understanding an LLM might be a useful practice task for eventually building up to tackling language acquisition in humans. And I think there’s some merit in that, although of course one risks being led down a blind alley—LLMs might, after all, yet be constitutionally unable to match human performance (and if there’s any genuine creativity involved, and if that is necessary to the whole apparatus, they definitely are). So let’s try to invert the scenario: linguists have been studying LLMs for decades, and suddenly are faced with a system that does that same task orders of magnitude more efficiently. Shouldn’t that system be the more interesting one to study?
That said, I do think that LLMs warrant study, not just from the linguistic, but also from the philosophical point of view, and have argued so previously:
LLMs are, to many of their uses, what a plane is to flying: the plane achieves the same end as the bird, but by different means. Hence, it provides a testbed for certain assumptions about flight, perhaps bearing them out or refuting them by example. […] I propose that LLMs can offer us new insights at least in three areas—the philosophy of language, the evaluation of philosophical thought experiments, and perhaps most interestingly, the identification of implicit, systemic bias.
As I said, it’s an alien that landed on Earth! Would biologists not drop everything if an actual alien landed, just because they’re different and the lessons may not be directly applicable to humans?
That difference is itself interesting! I dispute the details a bit–LLMs write like a fairly bright college student, not a toddler–but the overall point is valid. Clearly some very different things are going on. Do humans have something special going on, language-wise? Is Pinker’s language instinct true after all, in that we’re born with a “base model” that was trained on billions of samples of data[1], and as individuals we’re just fine-tuning that model to express a specific language? And some of this research can be conducted on the LLMs themselves. Is it possible to train a base model that’s totally language-independent, but with only human-scale fine-tuning acquire English or another specific language?
Going a little more deeply, maybe the true “base model” is actually the mental world model that all animals are born with. And that it’s much more efficient to build language on top of that, whereas the LLM has to poorly reconstruct one from scratch.
But in any case I don’t think we’re exactly disagreeing here. To use your analogy: are linguists aeronautical scientists or just ornithologists? The general problem seems more interesting to me but I don’t know what linguists think.
I’m aware that the only feedback mechanism is through evolution. The learning rate is undoubtedly much slower than even LLMs accomplish. But there are a lot of samples. ↩︎