I mean, physically.
AI is a popular subject lately, and I’m just curious about what an AI actually is. I get that it’s software on a computer, but beyond that… is it just one computer, or a bunch of networked computers? Are the computers regular servers or are they some sort of sci-fi supercomputers? And what about the software? How big is it? How many gigs does an AI take on a hard drive? And do AIs have to always be online, or can they operate off the internet?
Inquiring minds, etc.
Not to start us off with a lame answer but it varies greatly. Assuming that, for purposes of this thread, “AI” is referring to things like image generation and Large Language Models (i.e. chatbots) and not the “Bread Brain” in your toaster, you still have the spread from systems able to run on a mid-level home PC to systems using dedicated data centers.
In a general sense, AI models use GPUs (Graphic Processing Units; what’s in a video card) instead of CPUs to run their programming. Real AI intended cards aren’t actually video cards but use a similar architecture. GPUs excel at running a lot of processes in parallel which makes them much more efficient for these operations than CPUs are. “VRAM” refers to video memory on the graphics card like how RAM is basic system memory.
On the low end, you can run image generation software like Stable Diffusion on as low as 4GB of video memory effectively (or even less but not really worth it). That’s a cheap low end GPU. You can run a half-decent language model on 8-10GB but it’ll be slow. On my own system, I can run a fairly robust image generation model on 24GB of VRAM and it’ll take maybe 10 seconds to generate an image, more or less depending on what I’m doing. I can run a 24B LLM (A large language model with 24 billion parameters) on the same 24GB VRAM card and get responses within a few seconds. I can run a model with 70B parameters and get responses that look as though someone is typing them out due to the delay time in my card generating a response. My 24B models take around 19GB of drive space and the 70B model takes 47GB of space. My video card was top of the consumer line until recently (RTX 4090) when its 24GB was topped by the RTX 5090 with 32GB. I can run these systems offline as they run completely locally on my PC.
I tell you that to lend a sense of perspective for the big guns. ChatGPT is estimated to have 1.76 trillion parameters and take up at least 500GB of drive space . The major AI companies run these systems on data centers filled with specialty cards such as the Nvidia H200 which costs around $35,000 each and features 141GB of memory which is designed to run faster than the stuff in even a top line consumer card. These data centers are running hundreds of thousands of these (or similar) cards. Musk said he was opening a data center with 200,000 cards, and there’s plans for centers utilizing 300,000+ cards in the near future. There’s a reason why Nvidia is topping the “most valuable companies” these days (nearly $4 trillion in market value). These data centers run in tandem to fulfill the millions of requests coming in from systems like ChatGPT and more dedicated AI applications. For even one person to use ChatGPT would likely require hundreds of thousands of dollars in hardware due to the massive model size. Fortunately, two people running it doesn’t double that requirement but millions of people using it at once certainly requires a lot of processing power.
There’s not a simple answer to this. Most of what gets labeled as AI nowadays is software build around either a Large Language Model (LLM) or image generation model (diffusion or auto-regressive). LLMs might also incorporate vision models in which case they’ve been called vision language models (VLM) or multi-modal language models (MLLM).
Both types are basically prediction models where you take in some numbers (an image is an array of numbers), perform some mathematical operations on them, and output other numbers. The types of operations that they do are done very efficiently on GPUs. These can be consumer cards that are used to play videogames or industrial servers.
Sizes vary a lot. Frontier language models are mostly hundreds of billions of parameters while image models tend to be at least an order of magnitude smaller. You can run most models locally if you have top of the line hardware and “quantize” your models, but mostly they’re run on servers.
The AI you interface with likely aren’t these raw models. There are often layers of filtering of inputs, adding additional inputs to get desired outcome, and/or routing the inputs to different models. Exactly what’s done is likely held close to the chest by each of the companies that build them.
The TLDR is that AI is a lot of matrix multiplication with parameters trained on a lot of data, run on GPUs.
This question almost isn’t meaningful, these days. Even starting years before ChatGPT took off, a lot of computing has been done in “the Cloud”. Basically, this means that there’s some company somewhere that has a bunch of computers, and you contract with them to use some amount of computing resources. The resources you use might all be done on one actual device, or they might be on multiple devices, but the point to you, the consumer, is that you don’t have to care about whether it’s one device or many: The hand-offs are all seamless enough that regardless of what it actually is, you can treat it as if it were all one device.