Would leading AI companies see any benefit from using excess processing power of individuals computers and gaming systems

I don’t know anything about computer science, but I remember how years ago we’d have distributed computing systems where you’d download a program and it would use your excess CPU capacity to run models to benefit various scientific programs. That way instead of a company buying a supercomputer, they just got a million people to download the program onto their laptops.

Would leading AI companies see any benefit from doing this for people’s excess capacity on their gaming consoles or personal laptops? Like if Chat-GPT offered a discount, or free plus usage, if you agree to let them use the processing capacity on your gaming consoles and laptops when they weren’t in use to train the next models, or run the current models.

Or would that violate privacy laws (since you’d be discussing people’s personal searches on AI), or are there endless other bottlenecks other than FLOPs that would make that idea not effective?

The tl;dr answer is “No.”

How much detail do you want as to why this wouldn’t work.

Stranger

AI training needs a lot of processing power and memory, used in a way that can’t be easily distributed across networks of physical computers. It requires big data centers full of very fast processors with a lot of memory hooked up to each other with very fast connections, running for weeks or months at a time.

That would become years or even decades if the same amount of computing were instead done over high latency systems like the internet. By then their competitors would’ve made Skynet already.

The connections within and between processor nodes in a data center happen over specialized cables that are orders of magnitude faster than the public internet.

That’s still not fast enough for some, and there are companies making entire wafer sized training chips to make it even faster: Cerebras

It’s not like crypto where the problems were inherently parallelizable because, well, they were designed to be. They were artificial puzzles made to be solved by distributed PCs.

AI model training needs more “all at once” memory shared between processors and that requires the fast connections between them.

Even inference (“using” a trained model by giving it a prompt and getting answers) works better with extremely high memory systems (we’re talking dozens or hundreds of gigabytes of GPU RAM), which most consumer home systems don’t have.

There are a variety of techniques to train and run inference on simpler, weaker models, but the state of the art ones require lots more than what most home users and businesses have access to. They are their own special kind of monolithic supercomputer. The hardware is also evolving very quickly, with Google and Nvidia making their own hardware and a bunch of startups exploring alternative architectures.

At a very rough estimate, a GPT-4 level LLM might take about 3e25 flops (floating-point operations) to train. The recently released top-end consumer gaming GPU, the Nvidia RTX 5090, might be able to do about 105 teraflops/second or 1e14/second. It would take that one card about 9500 years to train that model (which it couldn’t actually do, since it doesn’t have enough RAM anyway… even if it somehow lasted 9500 years inside a nuclear space bunker).

Meanwhile, their new data center “card”, which is really more like an entire server rack, costs about $4 million apiece and can do about 360 petaflops/second. That’s about 3500x the performance for only 1300x the price (what a steal!). One of those would shorten training down to less than 3 years. And companies don’t just buy one; Apple recently ordered about 250 of them for a cool billion, which could do that same training in 4 days.

From an AI commercial company standpoint, they’re all basically just lighting investor money on fire anyway in their desperate race to AGI (or at least trying to outlast their competitors in the slow crawl towards bankruptcy).

These numbers are assuming just one of the AI companies training one model a single time. There are at least tens of these companies, along with hundreds of smaller ones that feed off the big ones, each doing many, many repeated training runs.

To train them on idle home computers would still require an enormous amount of money (because processing is expensive no matter where it comes from, and then you have to add in all the coordination between hundreds of thousands of individual machines over the public internet, which is no small feat). It would also end up taking much, much longer, if they can even get the requisite compute together. SETI@home maxed out at about 600 teraflops/sec in 2013, barely 6 of those 5090s. Even the ongoing BOINC distributed compute projects combined cap out at about 18 petaflops/sec across 145k computers, or about 124 gigaflops per machine. That’s not a lot of compute power per machine, and while it would be faster than 9500 years, it would still likely take decades. All of BOINC couldn’t match even a single Nvidia data center offering, much less hundreds of them.

I don’t think it’s an understatement to say that these AI projects are species-scale, or at least nation-scale, projects. Most smaller countries couldn’t afford them, much less some ragtag bunch of home users.

It’s possible that further advancements in training will decrease the required training (as shown by the Chinese DeepSeek model, which offers pretty good performance at lower training costs (3e24 flops), but that’s still a lot.

If you figure out a way to do it more cheaply, you’ll be an overnight trillionaire.

It is entirely possible to run your own AI at home. You’ll need a pretty strong desktop PC to do it well (not your laptop) but it is within reach of most people (and if you don’t mind it being slow you probably can do it on a laptop).

The big data centers you see are there to handle many thousands of requests simultaneously.

Those are talking about inference, or “using” the AIs, not training them. They use models trained by big companies (usually Facebook’s LLAMA or DeepSeek), “quantized” (shrunken down and essentially “sampled”) for home use.

In between, some home computers with powerful GPUs and enough RAM can also do some level of “fine-tuning”.

But these are nowhere near the compute required for training a full model from scratch, or even using it for GPT4o-level inference that ChatGPT etc. do.

You can also run inference on a small LLM on your phone, like in the mobile game AI Dungeon. That doesn’t mean it can do everything more recent LLMs can do, at least not to the same extent.

Explanation:

Examples of hardware costs and times required:

It’s not just that. Crawling datasets, training, fine-tuning, inference, serving requests… it all takes processing power. A lot of it. AI responses can’t really be cached like a static website can, unless it’s for the exact same prompts (and even then, that disregards user-provided context and prior “memory”).

None of that would really benefit from being served from home users on the internet. Even a regular CDN, with distributed edge nodes, run those from regional data centers with fast internet connections that home users on cable and DSL won’t have. And AI requires much more compute than a normal CDN, even just for inference, if you want both high quality outputs and a reasonable tokens/second/user level of performance.

That said, it’s not as binary as “you either have to be a FAANG or you can’t do any of this on your own”. Smaller companies do train smaller models too. You can also rent cloud services for short-term small-scale training, like https://vast.ai/ or https://salad.com/ or https://www.runpod.io/. Those are more powerful than your average home computer but much less powerful than the bleeding-edge high-end stuff that the big AI companies use.

That was (and still is!) BOINC (Berkley Open Infrastructure for Network Computing) and it’s still active. See downloadable app at https://boinc.berkeley.edu/ or Wiki

It was originally done for SETI (Search for Extra Terrestrial Intelligence) but now is used for a bunch of different scientific projects, many in physics, math, and health research.

It’s been running in the background on my computers for 20+ years now.
But this work is donated to public research – not work for private, profit-making companies like the OP mentioned.

It’s too bad US and international politics are what they are. In a better world, this might’ve been the next Apollo Project or International Space Station, done for the betterment of an entire country or world.

Now it’s just another money grab, and the winner is going to make today’s monopolies seem like corner store chains…

OpenAI is technically a nonprofit, sort of, but not really, and it functions like a for profit. The nonprofit part is left over from its more idealistic days, before the billions took over, and now mostly just causes internal ideological battles and financial infighting. The leadership and board have been shaken up a few times. Microsoft, its main investor, was not happy.

This is a lot of power and money being funneled to relatively small groups of people. It got ugly quickly.

I haven’t used Runpod in a while but it used to be that you could rent other people’s RTX 3090/4090 GPUs on their home computers for lighter applications such as training Stable Diffusion LoRA models. Might still be able to but I don’t use Runpod these days.

That’s ridiculous. Non general AI has plenty of uses as well.

Sure, but that’s not what’s driving the investor frenzy right now. Glorified autocomplete is nice, but in and of itself isn’t enough to justify long term investments of the magnitude we’re currently seeing.

As the industry matures and the bubbles burst, a lot of those companies probably won’t make it and I’d guess we’d be left with just a couple of major chatbots instead of a dozen. But it’s early yet; anything could happen…

I hate to be that guy, but…FLOPS (or flops) stands for Floating-point Operations Per Second. So flops/s is floating point operations per second per second. If you are going to use flops as a quantitative metric of computing, it’s important to get the units right.

FLOP for FLoating-point OPerations is somewhat cromulent, but I’ve always been more comfortable spelling it out.

Yeah but the ninth generation gaming consoles offer about 10 teraflops/s, and about 100 million of them have been sold. Obviously not everyone is going to want to contribute them to an AI program, but in theory the gaming consoles alone is about 10^21 FLOPs. Not a ton, but you can get 10^27 FLOP in a month if all those consoles were working 24/7 on contributing to an AI program.

I know there are endless other bottlenecks which would be limitations, I just didn’t know if that excess computation capacity could hold any use for AI programs. As far as public vs private, thats a valid concern. But theres no rule that that would have to be used to help a private company vs a public company.

I do wonder if in a few years we will just have some kind of NATO-esqe AGI and ASI program where the entire western world comes together to get to ASI first.

Its nice this program still exists. I used to have folding@home, but I get the impression that deepmind may have made programs like that less necessary with their alpha fold.

Heh, yeah, the ambiguity bugged me too. I tried to be consistent with lowercase “flops” vs “flops/sec”, and the singular “flop” just sounds… wrong…

I propose a new SI unit, “aishenanigans/sec”.

Something I briefly touched on, but probably didn’t explain well enough (because I don’t know all the details enough, myself) is that AI training takes a considerable amount of RAM (hundreds of gigabytes or more). Consoles don’t have that, so you end up having to virtualize it across a bunch of consoles (with very high network latency) and/or swap it to disk (also with high latency).

AI training, by current methods, isn’t a workload you can piecemeal divvy up across a bunch of independent small processors like that, unlike (say) crypto mining or SETI@home chunks or rendering movie frames. From what I understand, the entire model-in-training has to reside in memory during training so that its weights can be adjusted in realtime by any of the nodes crunching numbers. That requirement gives rise to fast interconnects like NVLink, with a bandwidth of some 1.8 TB/sec, which is much, much faster than a slower connection like the internet. That’s still not fast enough, and the whole-wafer prototype chips can get up to some 12.5 petabytes/sec.

In other words, not only do you need powerful GPUs, you also need a crapton of memory, and very fast connections between all of them. Consumer devices don’t have that, even if they can contribute compute.

I just didn’t know if that excess computation capacity could hold any use for AI programs

It’s just not as easily parallelizeable as other workloads, right now, as far as I know. (But I’m not a computer scientist, just a web dev, so someone smarter will have to explain in more detail.)

That’d be great, but if that happens, it would probably be in the form of pooling dollars rather than Playstations. There is too much overhead to try to harness the moderate compute that consumer GPUs can put out. At AI scale, even the data-center scale systems aren’t big and fast enough, and they want to make more, bigger, faster, ones powered by nuclear reactors. It’s a whole new generation (or several) of supercomputers.

The “video cards” (or whatever you want to call whatever evolved from them on the highest end) use much higher bandwidth RAM than mere DDR5 used as main system memory, and they (the NVIDA ones at least) are designed to work as up to 8 of them on a single motherboard combining their processing and memory like a single gianter “video card”. So if you have 8 of the 80 GB cards it acts as if it is a single 640 GB of video RAM. (I see the GH100 has versions with up to 141 GB of RAM, so I suppose you can get a 1,128 GB virtual combined memory.)

If you had to run the model in a TB of DDR5 instead of GDDR6 or HMB you’ll get a big performance hit (I don’t know how much). If you didn’t have enough physical memory and had to rely in swapping segments in and out of storage the performance hit would likely be measured in orders of magnitude. If you are doing it over IP packets spread across the world, you can probably add more orders of magnitude loss.

A system with an 8 card configuration, a good CPU, and plenty of main memory probably runs a good $400,000 or close enough. That’s what you would need to run an instance of (say) ChatGPT 4o at home. Not quite in my budget.

Fair enough, but what about this hypothetical situation.

They get all the gaming consoles connected, but they turn the clock speed up to 11. I know most gaming consoles only allow you to turn the clock speed up to 10, but instead we turn the clock speed up to 11.

Surely that will lead to AGI.

Sadly, AGI needs at least 12, ideally 13. Maybe with the PS6?

There are companies that allow users to rent out their GPUs (About Vast.ai | Vast.ai) and there’s research into distributed computing w/ consumer hardware ([2312.08361] Distributed Inference and Fine-tuning of Large Language Models Over The Internet or [2309.01172] FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs). I’m sure I’ve seen papers about heterogeneous hardware as well.

I don’t think leading AI companies would do that. Meta bought something like half a million H100s, there wouldn’t be a need. And it’s pretty easy to rent a V100 or A100 from AWS. But if you wanted to for some reason? Yeah, sure.

Frankly, the distributed element that would have the most impact is the curation and tagging of training data…