How I Cut Our Claude Invoice by ~40% While Increasing AI Usage

It's time for CFOs to start paying attention to their AI bill...

May 28, 2026

👋 Join thousands of other finance leaders learning more about AI. I am sharing all the things I (and others) are doing with AI and how you can build an AI-first team/company.

CFOs Are Reviewing AI Model Costs

The average business is spending 13x more on AI tokens than in January 2025, according to Ramp’s spend data 🤯. AI is quickly becoming the largest vendor expense on your P&L and it may soon (if not already) crush your gross margins as well.

Internal AI model adoption is being pushed harder than ever (tokenmaxxing) and everyone is releasing more AI products with worse gross margins. Growth on AI spend has gone exponential in the last few months.

The size of our AI model (chatgpt/claude) bills have gone from nothing to pretty massive in a short period of time. And now every CFO should be paying close attention.

In this post, I will share some of the things that companies can pretty easily do to slash their AI bills.

AI Model Pricing

The two charts below show the different AI model pricing for the two way communication that a user has with AI:

Input tokens: Your prompt, convo history, attached files, tool definitions, etc
Output tokens: What is returned (text, file, etc), tool calls made, code returned, etc

The big take away is that the “smartest” AI model is 5x+ more expensive than the cheapest model.

You should also understand that there are three ways that AI model companies will bill enterprise licenses:

Seat-based pricing: flat fee paid annually upfront for the privilege of simply accessing the platform (~$18/user/month)
Claude/ChatGPT usage: billed monthly in arrears based on actual usage (on top of the seat-based pricing) from the rates noted in the charts above
API calls: Paid upfront on a credit card. This is for your product that you sell or for internally built apps that are hosted.

How We Cut AI Model Costs By 40%

TLDR: You don’t use a heart surgeon to take your temperature. Similarly, you shouldn’t have your CFO reviewing T&E expense reports.

This is the primary problem at most companies.

They are using the most expensive model (or even the 2nd most expensive model) when they could be using a cheaper one that provides results that meet their needs. There is HUGE potential savings by simply using the right model.

Let’s say you have a $100K/month bill for Claude:
You currently use Opus 4.7 (the most expensive model) for everything, but if the company was to only use the model it actually needs then it could look something like the below:
Opus is rarely needed for every task. So if users shifted to using Sonnet and Haiku when appropriate then the company would save nearly 50% ($576K per year)…

We want everyone using AI and adopting into all their workflows, but they don’t need the most expensive AI model for every task. When the Anthropic bill was small, no one cared. But as the costs have grown, there is significant ROI in making sure you are using the least expensive model needed to get the job done.

Most folks on an enterprise usage plan are paying premium rates (the heart surgeon) for basic responses (checking your temperature).

How Do You Actually Do it?

There are two different paths based on how AI is used and priced:

Enterprise subscriptions/usage
API calls

#1 is where almost all your internal AI usage comes from.

Enterprise Subscriptions/Usage (internal AI usage)

You have fewer controls here because users can choose which model they use, but there are a few things you should do that will save a lot of money.

Below are three things that every company should do:

1. AI Admin Controls

The Claude/ChatGPT admin should set the default model to the middle tier (e.g. Sonnet for Claude). If you do that, then 90%+ of users will just use that model. While they usually won’t be using the most expensive model, they will also almost never downgrade to the cheapest model. But just doing this may save you up to 40% (price difference between Opus and Sonnet).

Benioff said they will spend $300M on Claude this year. And I have also heard from various people that everyone just defaults to Opus for everything. They could probably save $100M+ by using cheaper models when appropriate…

User Usage Limits: Obviously you should put controls on users so they don’t create something that just blows up AI usage. Admins can put dollar limits by user (e.g. engineers get $5K per month while finance gets $1K per month)

The below was just reported by Axios 😱. Maybe not entirely true (or missing some details), but you MUST have spend controls on your AI stuff so this doesn’t happen to you.

2. Admin Manual Tracking

Admins get dashboard reporting that show what each user is spending and what models they are using. If someone is burning a ton of tokens using the most expensive model on easy tasks, then the admin should have a conversation with those users and encourage them to use something else.

Don’t decrease AI usage…but choose your AI models more wisely.

3. Employee Education

Make sure employees understand the cost difference between these models. Many just don’t know. Forward them this article :)

There are also several other things (besides different AI model costs) that you can teach employees so they spend significantly less on tokens. For example:

Use a new chat window for new tasks. Each time you prompt in the same chat window it re-reads the entire chat history. This gets expensive quick (especially if you are on a premium model). I have seen people just default to the same chat window every time they go to AI…
Be upfront with everything you need in one prompt. Back-and-forth exchanges compound quickly. Every reply re-reads the full history. Give Claude the full context and ask for exactly what you want in one shot.
Update the AI model in your automations, agents, skills, etc. A lot of scheduled tasks are structured and repetitive (often fine for a cheaper model). Users can define which AI model to use in agents/skills/automations. Test out a lower tier model and see if it works fine.

API Calls (internal apps and your AI products)

API calls happen when you aren’t in the ChatGPT/Claude app. It is when you build an internal app or customers use your product and it calls an AI model for something.

There are more options here to control what AI models are used in the response, which is great because you will want to control it to improve AI gross margins.

Model Routing: directs AI requests based on a defined criteria to the right AI model

The goal is to use the cheapest passing model for each task. “What is the minimum requirements we are OK with to check someone’s temperature and can they do just as good a job as the heart surgeon?”

I talked to one CFO that saved a few percentage points on their AI gross margins after they switched from the most expensive AI model embedded in their product to a model routing system that switched a lot of tasks to a cheaper model.

While a well-structured model routing process is going to have the highest ROI, there are other ways to also cut down costs on AI API calls. Some examples I have done:

Analyze your actual usage: Anthropic’s console shows token usage by API key. You can also tag calls by feature or workflow and see exactly what is driving costs. Do this so you can figure out how to make the expensive stuff more efficient
Batch API: If a task doesn’t need a real-time response, you can use Anthropic’s Batch API and save 50%. A lot of stuff doesn’t need to be real-time
Prompt caching: If you’re sending the same large context on every API call (a system prompt, a reference document, a knowledge base), then you’re paying full input price every single time. Prompt caching stores that content so repeat calls cost ~90% less on the cached portion. Make sure your engineering team built this…

Final Thoughts

AI spend has become very large in the last few months. It’s time for CFOs to pay attention. We want to drive AI adoption. It will likely be the difference between companies that survive and those that die in the AI world.

But…

It doesn’t mean we should just light money on fire and throw all of our investors’ money to Anthropic/OpenAI. It’s time to add more process and educate employees so you can cut your largest vendor bill in half.

Footnotes:

Email me and tell me what you are building. Or reach out with questions!
Subscribe and forward this newsletter to your team.

AI Stuff From The Week:

The below tweet blew up because I think it feels so real for all finance leaders right now… This was me reviewing the April usage bill lol.

And it’s why I spent the last two weeks pushing all the things I wrote in this article!

Hearing many cases of people double-clicking into the ROI of their AI spend. It’s not a question if AI is valuable (it obviously is), but rather is how much they are spending and how they are spending it have decent ROI.

Phaetrix

Jun 6

The interesting shift is that AI is moving from a technology discussion to a capital allocation discussion.

When usage is small, everyone optimizes for capability.

When the bill becomes material, they start optimizing for return on investment.

That is usually when an industry starts to mature.

Shaun Hanson

May 29

Great article, Alex! I wish Anthropic had a built-in model router that analyzed the prompt and determined which model was the best fit, but they have no real incentive to do that. Maybe someone needs to vibe code one 😉

2 replies by OnlyCFO and others

2 more comments...

CFOpilot

Discussion about this post

Ready for more?