So my understanding is DeepSeek is comparable to maybe OpenAIs o1. Does it have reasoning skills or is it more like GPT-4?
My understanding is they trained it with far less money, less computation and less hardware than US AI researchers use.
But still, wasn’t the groundwork to actually build cutting edge AI still done by the US?
This is like if the US spent $100 billion researching how to make a 6th generation fighter jet, and then China spent $500 million building one after they stole the technology. Yeah they can build it cheap, but thats because China didn’t have to do all the R&D. They just let the west do all the R&D, then built a knockoff.
What exactly is all the hype about? I feel like China took technology invented by the west and built a cheap knockoff. Why is DeepSeek considered so groundbreaking?
Did DeepSeek pioneer new forms of algorithmic efficiency never seen in the west? Did they find ways to create AI comparable to western models with less hardware?
Or did they just take western technology and build a knockoff for cheap?
I don’t think there’s enough information yet to really say where on the hype/real spectrum it lies. It’s definitely a good model and it definitely builds on the foundations others built. But that’s true of all models out there.
They did come up with real training efficiency gains–that much is clear from their paper. But at the same time, it seems very unlikely that it really cost as little as they say, and that they’re probably using NVIDIA GPUs that they shouldn’t have access to (due to export controls).
No one should dismiss them as a serious competitor. Nevertheless, the claims are almost certainly a tad exaggerated. We’ll probably find out soon because at least in principle, their paper allows the claims to be replicated, and it’s cheap enough that almost anyone can do it as a side project.
I’m no expert, but the comments I’ve been hearing online is that it’s a lot cheaper because the organization that created it wasn’t top-heavy with extremely highly paid executive “leaders” siphoning off the budget. Or in other words, it wasn’t as scammy, and the issue is less China being clever or stealing anything as it is the US version being run by con artists and overcharging profiteers. So the fear is less of China overtaking us than “Oh, God, everyone’s going to find out where the money’s been going!”
Of course that is online rumor, so take it with lots of salt.
None of that makes the slightest bit of sense. When people talk about the costs of training, they’re talking dollars of GPU time. The claims about Deepseek are that they used far fewer GPU resources to train their model relative to comparable models.
Besides all that, Deepseek is basically a side project to… a hedge fund. The idea that their leaders aren’t well-paid is silly.
Correct. It’s got very little to do with money; the limitations are due to difficulties Chinese companies have with getting access to large numbers of GPUs.
Yeah, but the real story isn’t going to get spammed by Zoomer Luigi fans all over Twitter and Reddit. The “CEOs bad” storyline, on the other hand…
This is right at the edge of what I can reasonably talk about. I’ll just say there’s a great deal of uncertainty about the GPU resources Deepseek has access to.
The Luigites seem to have left for the blue site. Reddit, though… that place is an utter wasteland these days. Just warring bot farms as far as the eye can see.
Maybe, but again, the costs have nothing to do with management. It’s about GPU resources.
Aside from making the models smarter, basically all AI research is making them smarter per unit of GPU resources. Both on the training and inference side. Deepseek has made genuine improvements on the training side. But whether they’re revolutionary or just a pretty nice boost to state-of-the-art remains to be seen.
According to poe.com, Deepseek is a ‘reasoning bot’.
They use all the AI tools to help answer questions, and they put DeepSeek in the same category at 01 mini.
I guess I’m still confused.
OpenAI cost $80-100 million to train. DeepSeek cost $6 million to train. But did Deepseek steal some of the training info from other programs to save costs?
There are also speculations that China was using NVIDIA chips it smuggled to train its model. They may have also underestimated the cost. I have no idea though.
As far as 01, I think 01-mini costs 1/5 as much as 01. However I don’t know if that means it cost less to train, or if they just took the training from 01 and they just mean the cost to run 1 million tokens on 01-mini is 1/5 the cost to run a million tokens on 01.
That is the dominant factor. Basically no one is charging for training costs yet. It’s a capital expenditure. But inference takes GPU time also, and bigger models that spend more time reasoning are more expensive.
AI companies are hoping that improvements in GPU cost will lower the cost of reasoning. But what they’re also doing is building very expensive models and letting those train smaller models. It might sound like garbage-in-garbage-out, but it works. Ideally, you can then use the distilled model you made to make yet another model that’s even more efficient.
Some information on the training process for Deepseek R1:
Basically, they started with their previous base model (which was already pretty good, but fairly conventionally trained), used that to bootstrap an interim reasoning model (which was not particularly good on its own, but was good at reasoning), and then used that to bootstrap the full R1 model.
The challenge seems to be getting both training data (since long chains of reasoning aren’t likely to appear on the internet), and then successfully using that to train the model. But the interim model, even if it’s not that good, can at least generate synthetic data–which you can then check by existing non-AI means. Like if you ask it to produce a program, you can check the syntax and verify the output easily. And you can generate progressively more complicated problems that require longer chains of reasoning.
I’m not sure there’s anything all that special here–it’s state of the art, but we don’t know what’s going on inside "Open"AI or really anyone else. This seems to be the first detailed public info we have on this style of training.
Interesting that public datasets are starting to fade from importance. All of the base models out there have basically all the available data. So the only way to go further is through actual reasoning.
It sounds like, in part, the AI companies worry they’ll lose income because someone figured out they could undercut them by using their labor without pay in order to train a new AI model.
The $6mil is in compute time and doesn’t include the costs of the Nvidia cards, infrastructure, energy costs, etc. Also China almost certainly has better cards than they’re supposed to via importing them from Singapore through backdoor means. They’re using some combination of China-allowed cut down cards and “black market” cards but also less capable than what OpenAI is using. However, if China was able to train it for 1/50th the cost then it makes a lot of people wonder if building power infrastructure for US-based projects is a good idea or whether Nvidia can keep selling AI enterprise cards for $40k a pop as quickly as they can make them. In other words, even if the claims about cost are partially hype and bullshit, they’re probably not entirely hype and bullshit so it’s a good reason for pause on shoveling money into tech stocks.
Deepseek is supposedly very good at math, logic and coding. I’m not good at any of those things so I don’t have any real reason to put Deepseek through the paces but reports I’ve read from smarter people all suggest it’s extremely capable. So a very good model that’s being made cheaper and easier than the competition (even if somewhat exaggerated) is making a lot of investors second guess how much money they should be sinking into what’s looking like a bloated and inefficient US industry.
An awful lot of the money that went into the initial WWW boom was wasted. The money that wasn’t was handsomely rewarded.
NVidia may be overvalued, but maybe not as much as the current panic suggests. We certainly need more and better electrical infrastructure in any case for robustness, AGW, and the coming wave of EVs.
I don’t understand computer science, but here is the paper on it.
My understanding from someone else is that they did reinforcement learning as a way to train DeepSeek, and as a result it required far less training than models like Claude, GPT, Gemini, etc.
Yeah. Stock wise there was a sell off of companies engaged in energy management systems, cooling, nuclear power. That seems crazy.
But it raises a connected question: assuming the hype, they really did create a system that does more with much less power consumption and fewer chips, what is the global and geopolitical impact?
Will there be more smaller players in the space? Will there be less adverse climate impact to an AI explosion? Will the centers be just as large but now able to do more? Will it increase or reduce international economic conflict?
I’m guessing cheaper means used more and adding additional capabilities faster. End of day same adverse climate impacts. Some of the big companies face more competition and that is not a bad thing.
Just from reading the articles in the Times today, it sounds like the freakout was about a more efficient and cheaper training system cutting into the sales forecasts of Nvidia and associated companies. This kind of AI is fairly new, it is hardly surprising that someone would come up with ways of making training more efficient. It might mean that you can make do with fewer compute centers. And no doubt the methods employed will be in the other systems in no time flat.
From experience with other bubbles, I sense that the market may be overextended and there is fear that the slightest nudge will cause a steep decline. Could this be the nudge? Who knows?
I think it’s a bit of a wake-up call, and the fact that it’s China is a bit of a red herring.
NVidia somewhat rested on their laurels; GPUs are not specifically optimized for AI, and there hasn’t been enough of a push on things like energy efficiency. There was always a possibility of someone doing more with older chips, or adapted chips, I think it was just assumed that this level of disruption wouldn’t happen this fast.
The whole thing of China making a “knock-off”…the algorithm for DeepSeek is public and open-source. There will be lots of deep seek “knock-offs” being developed in the US and other countries right now and that’s largely a good thing. Progress is all about copying and building on ideas.
I fear the opposite may be the case: higher efficiency often means more than proportionally higher deployment, yielding greater resource demands (the Jevons paradox).
They still used NVIDIA chips, so that’s a bit of a strange interpretation to make.
The problem really is with the entire software stack, which is mostly not NVIDIA. AI is just layers upon layers upon layers of frameworks and libraries and other things, and a lot of companies aren’t looking more deeply into this.
It’s a common problem across all of software, in fact, and there’s a tradeoff–do you try to save development time by just reusing pieces that are already out there, or do you make a serious effort to optimize your particular use case?
It’s a hundred-billion-dollar-question no matter what. Maybe OpenAI and others favored shipping fast as compared to optimizing. They beat the others to market, but at the cost of needing the absolute top hardware. Deepseek on the other hand spent time optimizing the hardware they had available, maybe because they had no other choice.
Everyone will have to move in this direction eventually, regardless. We see the same thing in console gaming. Everyone has the same hardware, so to stand out you have to optimize better. So games get better over time on the same level of hardware.