The business of AI (mostly LLMs)

I’ve been thinking a long while about starting this conversation, because though I follow news in this area pretty closely, it evolves so quickly that it’s hard for me to keep up (and I do try).

To be clear, what I want to talk about is the business models of purveyors of LLMs (OpenAI, Anthropic, Google, Microsoft, DeepSeek, others), the business of chips (mainly Nvidia, but also certainly Google recently), and any peripheral entities (like Mistral or Cohere or Blue Owl or whatever)(also, data centers and hyperscalers).

I’m less interested in what’s bad about AI (started that thread already) or what’s good (someone else started that thread already), but the financial mechanics and consequences of LLMs.

So, feel free to throw in any topics along those lines you want to talk about. Is everyone familiar with the big Nvidia dependency circle graph? If not, I might post that later.

I have a bunch of possible topics that fall into the finance category: the very different possible IPOs for Anthropic and OpenAI, the weird ebb and flow of capacity (Anthropic leasing an entire data center capacity from xAI!?), is it economically better to pan for gold or sell shovels?, and a bunch more.

But for now, I want to talk about one thing: The end of the gravy train. The costs of both inference (running models for you) and training are somewhat well documented or inferable, and they’re pretty massive. When it comes to charging for that, APIs have long been usage-based, but corporate AI desk products were usually sold as subscriptions or seat licenses with generous usage limits. As AI agents drive up compute costs, companies like Anthropic and OpenAI have been signaling a move toward more direct usage-based pricing. GitHub Copilot just today made that transition, but AFAIK most enterprise AI products have not yet fully switched to token-based billing. Feedback on the change to Github CoPilot has been pretty strong, though it shouldn’t be all that surprising.

Any thoughts on that one? Workplace users of LLMs for say coding, are things going to change/already changing where you work?

I’m not sure how it will shake out for coding, but I think token-based billing will remain the standard for non-interactive use-cases (e.g. APIs).

Part of the challenge is that AI providers don’t know how much their enterprise customers are going to use it and part of it is the customers don’t know how many tokens the agents will use.

I see things online about users running up giant token counts; including Peter Steinberger (OpenClaw) spending $1.3M in a month at OpenAI. The safe bet in the short-term is to charge a monthly fee with usage caps, then try to upsell if enough users hit their caps.

In my business unit, we use Gemini Code Assist, but I’ve seen other BU’s using other providers. There are different ways to integrate and use it so the token usage must vary; AI auto-complete should use less than vibe coding.

For my experimentation at home, I’ve been using ChatGPT Plus with Codex to vibe code. There is a rolling 5-hour window that is easy to max out. You can play games with context size and model size which undermines the whole experience.

I am curious how pricing will play out with upcoming IPOs. With a traditional web 2.0 IPO, user base was king since processing overhead was negligible. The assumption was users were worth $X per year in advertising. It’s obviously very different for an AI provider.

Will we see a delicate dance of raising prices just enough to not lose too many users? Or a push to yearly subscriptions to eliminate churn (like mobile carriers in the US)?

Training a model is still expensive, but running a model once it’s trained is pretty cheap. You can download a complete model and run it real-time on a single consumer-grade computer. Granted, a large company looking to use an AI system for customer service (set aside the question of how wise that is) is probably going to need more than that to handle its needs, but it’s still a heck of a lot cheaper than using humans (even at third-world wages).

But you’re not running the model once. The user in my link processed 603B tokens in one month. That’s an extreme case, but there are ~1.5M SW devs in the US. Compared to llama-4 training on 30T tokens, neither training nor inference are negligible.

This is a peril of API-mediated inference (like in the agentic example you linked), where you just write some calls into your code and the meter starts running. Beyond that, the costs of local inference could be low, true, but it’s not clear to me how much enterprise inference is local, how much is hosted (e.g. in the cloud), and how much is run by APIs. But I’d guess most frontier models are hosted or APIs.

A bit of good news: some companies are becoming aware of tokenmaxxing costs and are scaling back internal requirements on AI usage. My former employer has those kinds of requirements, it will be interesting to see how quickly they adjust.

The point is, if you can do it cheaply on a local machine, then you can do it cheaply. Putting the machine that’s doing it in “the cloud” doesn’t change the amount of compute that’s needed. Cloud companies can charge whatever they want, but if the cloud company charges more than a local machine would cost, clients will go back to using their own local machines. Or, more likely, some other cloud company will undercut their price and get the clients instead.

And that price (local or cloud) is still going to be much cheaper than hiring any human to do the task, so if the task is something that needs to be done, companies will still use AI for it.

As for the cost of training, there may eventually come a point where the cost of training a new, better model is more than the incremental value of the new model over the old one. At that point, development of new models will mostly stop, and we’ll be left with just whatever models already exist at that point. But even if we hit that point tomorrow, what we have right now is already enough to turn many industries upside-down.

I think there’s two distinct use-cases too. There’s automated systems that use AI. They might be internal – scanning trouble tickets or doing code reviews; they might be external – providing AI-enhanced services to your customers. These cases are easy to monitor and somewhat predictable. You might have to include rate-limiting, but you generally know your load.

Then there’s tool / agentic use-cases like code assist tools where you don’t control the tool. You can’t predict if a tool update will greatly increase costs. Plus, as you point out, there’s tokenmaxing (which I hadn’t heard of before). I wonder how many tokenmaxers are doing it as a protest?

This is an interesting article about Microsoft scaling back AI use due to cost. I think this will become more common when the real costs are passed on to customers.

I wonder if the current costs they’re passing on to the customers is the actual cost of the tokens, or if this is still a discounted rate.

Depends on how you amortize the training and other up-front costs.

And it depends on if they’re figuring in a profit in the price as well.

Right? GPUs (or TPUs if you’re Google), servers, networking, data centers, power infrastructure, cooling systems, storage. R&D (including people and RLHF). I suspect public financial reporting for those will make Hollywood look transparent, but I also suspect their internal price accounting will be very different from that either way.

It will be interesting to see if employees are under less pressure to use AI at work as companies begin to pay for the cost of AI.

For enterprise work, like coding, some might be ok with the results from a local, open-source model and an open-source agent, but I wouldn’t think many. It’s going to be hard for an open solution to keep up with the main providers.

A company would have to be big enough to have an IT department up to the task, but small enough they want to use a weaker solution.

Considering how much IT is outsourced today, it seems unlikely. How many places run their own email server anymore?

Yeah, there are definite benefits, due to both specialization and economies of scale, to outsourcing everything to cloud companies. Which is why so much IT work is outsourced, these days, to those companies. That’s also why I think that the more likely outcome of one cloud company overcharging is for another cloud company to undercut them.

I didn’t mention the open-source models because I think that industry will actually move to them on a large scale; I mentioned them just because they’re a clear, unobfuscated view of what the real costs of running a model actually are. The models that most people will end up using won’t be those same models, but they probably will have about the same marginal cost as them.