DeepSeek V4-Pro price cut deepens AI model price war

For software teams that buy AI by the token, DeepSeek has just changed the default spreadsheet. Its V4-Pro pricing page says the flagship model’s promotional rates will become the standing rate from 31 May 2026, leaving input tokens at $US0.435 per million and output at $US0.87. Reuters and Bloomberg each reported that the 75 per cent cut is permanent.

For procurement teams, that matters because enterprise AI spending is increasingly a routing decision rather than a single-model bet. Once a lower-cost model is good enough for summarisation, internal assistants and large batches of coding or support work, buyers can keep premium models for the hard calls and send the rest elsewhere. CNBC’s analysis describes exactly that split, while Google’s Gemini 3.5 Flash pitch is built around the same idea: lower enough costs that volume moves first.

Investors, meanwhile, are looking at a less comfortable question. OpenAI’s pricing page still lists $US5 per million input tokens for GPT-5.5, Anthropic lists $US15 for Claude Opus 4.1 input, and Google’s Gemini API pricing still assumes buyers will pay more for the models with the strongest tooling and brand trust. Those advantages are real. The spread is still so wide that pricing power now needs to be defended with measurable results, not just benchmark bragging.

A week earlier, the competitive picture was already moving this way. Fast Company argued that cheaper and locally hosted models are becoming powerful enough for most users, and Semafor’s fresh report shows DeepSeek is willing to lock that logic into public list pricing. The distinction matters. A temporary promotion asks the market to test a cheaper tier. A permanent list price tells buyers they can design around it.

Cheap enough becomes the default

Inside engineering teams, the first workloads to move are usually the ones that scale fastest. CNBC says enterprises are already reserving frontier models for call-outs while cheaper systems handle default jobs, and Engadget’s coverage of Gemini 3.5 Flash shows how the market is being sold: good coding and agentic performance without flagship pricing. That does not mean developers will swap everything overnight. Reliability, latency, tool use and retraining costs still matter. But it does answer the user-side question of what moves first. Repetitive, high-volume work is where the savings show up fastest.

For most teams, that means building a router, formally or informally. Customer-support drafts, document extraction and internal search can run on the cheaper path. High-stakes legal text, board papers or complex multi-step agents may still go to OpenAI, Anthropic or Google’s top tiers. That blended architecture is exactly what cheaper challengers need. They do not have to win every call. They only have to win the default lane.

Engineer monitoring server racks with a laptop, reflecting the procurement teams now routing AI workloads by cost and capability.

Google’s own sales language shows why this fight has become urgent for the incumbents.

“many companies are already blowing through their annual token budgets, and it’s only May”

— Sundar Pichai, quoted by VentureBeat

For Australian buyers, that pressure is even blunter because most frontier APIs are priced in US dollars. A procurement lead deciding between a model that is roughly 11 times cheaper than GPT-5.5 on input tokens and more than 30 times cheaper than Claude Opus 4.1 does not need to believe DeepSeek is better at everything. They only need to decide that it is good enough often enough.

The math also changes vendor negotiations. If a CIO can point to a quarter-price alternative for baseline workloads, incumbent suppliers lose some freedom to charge for general capability and have to bundle support, compliance or integration into the premium. The margin defence shifts up the stack.

Here the sceptic’s case bites. GeekWire’s recent argument that many AI labs carry giant valuations before proving revenue density reads differently when a rival makes a flagship model permanently cheaper rather than temporarily discounted. Premium labs can still point to security reviews, enterprise support, platform lock-in and model quality at the edge. The question has shifted from whether those strengths exist to how much extra they are worth on an invoice.

Margins and chips are now the same story

Viewed from the regulator and infrastructure angle, DeepSeek’s move carries another message. Price cuts in this market are not only about competitive aggression. They can also reflect the reality that chip access, export controls and model optimisation are now bound together. Reuters’ reporting said DeepSeek did not disclose whether the move was tied to domestic Huawei Ascend supply, while Bloomberg’s coverage and a Techmeme-linked analysis of DeepSeek’s HBM optimisation work both point to a Chinese AI stack trying to do more with tighter hardware constraints.

Close-up of server infrastructure, underscoring how chip supply and data-centre efficiency sit underneath AI model pricing.

DeepSeek framed the cut as a new normal, not a clearance sale.

“officially adjusted to 1/4 of the original price after the 75% discount promotion ends on 2026/05/31 15:59 UTC”

— DeepSeek API Docs

Wired’s reporting that Anthropic agreed to pay SpaceX $US1.25 billion a month for data-centre access is a reminder that premium pricing has to fund enormous infrastructure commitments. That helps explain why frontier labs resist pure commodity pricing, but it also sharpens the investor concern raised by CNBC and GeekWire. If cheaper rivals keep improving, the market may start rewarding whoever best converts capability into durable contracts.

The wording matters because it turns a short campaign into an anchor price. Once a vendor makes the cheaper rate the public list price, every rival has to explain why its own premium deserves to persist. Some will respond by cutting. Some will bundle models into broader developer platforms. Some will lean harder on governance, uptime and integration. All of them now have to sell an economic story as well as a technical one.

The likeliest outcome is a market with more routing, more tiering and heavier pressure on margins. Developers get more leverage. Cloud buyers get a stronger hand in negotiations. Premium vendors still have a case, but a narrower one. DeepSeek’s cut does not settle the model race. It makes the cost of staying expensive much harder to ignore.

DeepSeek V4-Pro price cut deepens AI model price war

Cheap enough becomes the default

Margins and chips are now the same story

Related

Claude Opus 5 nears Fable at half price, Anthropic says

Why Palo Alto says AI token prices must fall before AI scales

OpenAI IPO filing puts AI pricing under scrutiny