The business of AI (mostly LLMs)

I’ve been thinking a long while about starting this conversation, because though I follow news in this area pretty closely, it evolves so quickly that it’s hard for me to keep up (and I do try).

To be clear, what I want to talk about is the business models of purveyors of LLMs (OpenAI, Anthropic, Google, Microsoft, DeepSeek, others), the business of chips (mainly Nvidia, but also certainly Google recently), and any peripheral entities (like Mistral or Cohere or Blue Owl or whatever)(also, data centers and hyperscalers).

I’m less interested in what’s bad about AI (started that thread already) or what’s good (someone else started that thread already), but the financial mechanics and consequences of LLMs.

So, feel free to throw in any topics along those lines you want to talk about. Is everyone familiar with the big Nvidia dependency circle graph? If not, I might post that later.

I have a bunch of possible topics that fall into the finance category: the very different possible IPOs for Anthropic and OpenAI, the weird ebb and flow of capacity (Anthropic leasing an entire data center capacity from xAI!?), is it economically better to pan for gold or sell shovels?, and a bunch more.

But for now, I want to talk about one thing: The end of the gravy train. The costs of both inference (running models for you) and training are somewhat well documented or inferable, and they’re pretty massive. When it comes to charging for that, APIs have long been usage-based, but corporate AI desk products were usually sold as subscriptions or seat licenses with generous usage limits. As AI agents drive up compute costs, companies like Anthropic and OpenAI have been signaling a move toward more direct usage-based pricing. GitHub Copilot just today made that transition, but AFAIK most enterprise AI products have not yet fully switched to token-based billing. Feedback on the change to Github CoPilot has been pretty strong, though it shouldn’t be all that surprising.

Any thoughts on that one? Workplace users of LLMs for say coding, are things going to change/already changing where you work?

I’m not sure how it will shake out for coding, but I think token-based billing will remain the standard for non-interactive use-cases (e.g. APIs).

Part of the challenge is that AI providers don’t know how much their enterprise customers are going to use it and part of it is the customers don’t know how many tokens the agents will use.

I see things online about users running up giant token counts; including Peter Steinberger (OpenClaw) spending $1.3M in a month at OpenAI. The safe bet in the short-term is to charge a monthly fee with usage caps, then try to upsell if enough users hit their caps.

In my business unit, we use Gemini Code Assist, but I’ve seen other BU’s using other providers. There are different ways to integrate and use it so the token usage must vary; AI auto-complete should use less than vibe coding.

For my experimentation at home, I’ve been using ChatGPT Plus with Codex to vibe code. There is a rolling 5-hour window that is easy to max out. You can play games with context size and model size which undermines the whole experience.

I am curious how pricing will play out with upcoming IPOs. With a traditional web 2.0 IPO, user base was king since processing overhead was negligible. The assumption was users were worth $X per year in advertising. It’s obviously very different for an AI provider.

Will we see a delicate dance of raising prices just enough to not lose too many users? Or a push to yearly subscriptions to eliminate churn (like mobile carriers in the US)?

Training a model is still expensive, but running a model once it’s trained is pretty cheap. You can download a complete model and run it real-time on a single consumer-grade computer. Granted, a large company looking to use an AI system for customer service (set aside the question of how wise that is) is probably going to need more than that to handle its needs, but it’s still a heck of a lot cheaper than using humans (even at third-world wages).

But you’re not running the model once. The user in my link processed 603B tokens in one month. That’s an extreme case, but there are ~1.5M SW devs in the US. Compared to llama-4 training on 30T tokens, neither training nor inference are negligible.

This is a peril of API-mediated inference (like in the agentic example you linked), where you just write some calls into your code and the meter starts running. Beyond that, the costs of local inference could be low, true, but it’s not clear to me how much enterprise inference is local, how much is hosted (e.g. in the cloud), and how much is run by APIs. But I’d guess most frontier models are hosted or APIs.

A bit of good news: some companies are becoming aware of tokenmaxxing costs and are scaling back internal requirements on AI usage. My former employer has those kinds of requirements, it will be interesting to see how quickly they adjust.

The point is, if you can do it cheaply on a local machine, then you can do it cheaply. Putting the machine that’s doing it in “the cloud” doesn’t change the amount of compute that’s needed. Cloud companies can charge whatever they want, but if the cloud company charges more than a local machine would cost, clients will go back to using their own local machines. Or, more likely, some other cloud company will undercut their price and get the clients instead.

And that price (local or cloud) is still going to be much cheaper than hiring any human to do the task, so if the task is something that needs to be done, companies will still use AI for it.

As for the cost of training, there may eventually come a point where the cost of training a new, better model is more than the incremental value of the new model over the old one. At that point, development of new models will mostly stop, and we’ll be left with just whatever models already exist at that point. But even if we hit that point tomorrow, what we have right now is already enough to turn many industries upside-down.

I think there’s two distinct use-cases too. There’s automated systems that use AI. They might be internal – scanning trouble tickets or doing code reviews; they might be external – providing AI-enhanced services to your customers. These cases are easy to monitor and somewhat predictable. You might have to include rate-limiting, but you generally know your load.

Then there’s tool / agentic use-cases like code assist tools where you don’t control the tool. You can’t predict if a tool update will greatly increase costs. Plus, as you point out, there’s tokenmaxing (which I hadn’t heard of before). I wonder how many tokenmaxers are doing it as a protest?

This is an interesting article about Microsoft scaling back AI use due to cost. I think this will become more common when the real costs are passed on to customers.

I wonder if the current costs they’re passing on to the customers is the actual cost of the tokens, or if this is still a discounted rate.

Depends on how you amortize the training and other up-front costs.

And it depends on if they’re figuring in a profit in the price as well.

Right? GPUs (or TPUs if you’re Google), servers, networking, data centers, power infrastructure, cooling systems, storage. R&D (including people and RLHF). I suspect public financial reporting for those will make Hollywood look transparent, but I also suspect their internal price accounting will be very different from that either way.

It will be interesting to see if employees are under less pressure to use AI at work as companies begin to pay for the cost of AI.

For enterprise work, like coding, some might be ok with the results from a local, open-source model and an open-source agent, but I wouldn’t think many. It’s going to be hard for an open solution to keep up with the main providers.

A company would have to be big enough to have an IT department up to the task, but small enough they want to use a weaker solution.

Considering how much IT is outsourced today, it seems unlikely. How many places run their own email server anymore?

Yeah, there are definite benefits, due to both specialization and economies of scale, to outsourcing everything to cloud companies. Which is why so much IT work is outsourced, these days, to those companies. That’s also why I think that the more likely outcome of one cloud company overcharging is for another cloud company to undercut them.

I didn’t mention the open-source models because I think that industry will actually move to them on a large scale; I mentioned them just because they’re a clear, unobfuscated view of what the real costs of running a model actually are. The models that most people will end up using won’t be those same models, but they probably will have about the same marginal cost as them.

Big news this week! Kind of. Anthropic has filed its possible intent to trigger their IPO.

Anthropic confidentially files IPO prospectus with SEC, landmark deal

Anthropic said it confidentially filed its IPO prospectus with the Securities and Exchange Commission, setting up a potentially historic share sale for investors ready to jump into artificial intelligence.

“This gives us the option to go public after the SEC completes its review,” Anthropic said in a statement on Monday. “The proposed initial public offering will depend on market conditions and other factors.”

The company has experienced explosive growth this year, announcing in May that its revenue run rate has ballooned to $47 billion, up from $10 billion in annual revenue last year.

Speaking as someone who has prepared investor relations data, I’d note that “run rate” is often the last month x 12 rather than the last 12 months. Either way, it’s a lot of money!

Very interesting. I wonder what the market’s appetite is for 3 giant IPOs (SpaceX, Anthropic, OpenAI)? It feels like the dotcom bubble all over again. Also wondering how the US federal government might interfere.

Also interesting that today Anthropic calls for pause of global AI development:

Getting a real pause to work would mean multiple major AI companies in multiple countries – most notably the United States and China – all agreeing to stop at the same time, under rules everyone could actually verify, Anthropic said.

Very much needed, but not going to happen.

This is a good point. Anthropic had a positive Operating Profit in Q2, but that excludes model development costs (as well as stock costs, discounted compute, etc.). Net Profit is years away (maybe 2028).

While most AI companies allow for some free use it is very minimal. For serious work and laying off a large percentage of your workforce assuming the rest will use AI and it’ll save loads of money is not working well. AI can get expensive very fast when used for a high-workload.

Here is a recent, prominent example:

AI is getting expensive. As a result, for some workers, the days of asking endless questions to chatbots may be numbered. To combat costs, companies have started to set limits on the amount of AI their employees can use, including Uber.

After blowing through its AI budget earlier this year, the company just announced that in order to manage costs, it has set usage caps on various AI-powered tools used by its staff, Bloomberg reported.

Oops…

Sorry don’t mean to keep posting, but I forgot to include this bit about S&P:

The S&P won’t fast track SpaceX for the S&P 500, which is expected to apply to Anthropic and OpenAI.

An exception for SpaceX could have also allowed leading AI companies such as OpenAI and Anthropic to gain entry not long after their own expected initial public offerings (IPOs). That possibility has now been shuttered.

and this is interesting (and a bit of a relief):

The news will likely come as a relief to people concerned about passive investor money and people’s retirement savings plans having greater exposure to the market risks associated with SpaceX’s big bet on AI and speculative orbital data center plans. AI companies are generally facing more challenges in funding and building expensive AI data centers, even as they shift more of the subsidized costs of running AI services onto shocked customers through usage-based pricing.

Let’s hope so. That is soooo much worse than is obvious:

Full Title: The Insidious Loophole Behind the Largest IPO in History — and the SEC Chairman Who (Gleefully) Won’t Ask a Single Question About It

It sounds like a windfall: your retirement account is about to own a piece of SpaceX — the most hyped IPO in history — automatically, before you could even place the trade yourself. But here’s what no one’s telling you. Your retirement account will be forced to buy this money-losing company at any price, no matter how overvalued, because the rules were quietly rewritten to guarantee it. You’re not the insider getting rich on the way up. You’re the mark, you’re the bagholder – you’re the useful idiot who will serve as exit liquidity.