DeepSeek V4 Drops at $0.14 per Million Tokens — And Open-Source AI Just Changed the Cost Equation Again

Exactly one year after the original DeepSeek-R1 moment that wiped a trillion dollars off American tech stocks in a single day, the Chinese lab is back with a preview of its V4 flagship — and the numbers are, once again, unsettling for the incumbents.

On April 24, DeepSeek released two models. V4-Pro clocks in at 1.6 trillion parameters. V4-Flash is a leaner 284-billion-parameter sibling. Both ship with a 1-million-token context window, and both are open-source under a permissive license. DeepSeek claims the Pro-Max variant outperforms every open-source peer on reasoning benchmarks, beats GPT-5.2 and Gemini 3.0 Pro on several tasks, and trades blows with GPT-5.4 on coding competitions.

The real shock, though, is the price.

The cost equation

V4-Flash charges $0.14 per million input tokens and $0.28 per million output tokens. V4-Pro is $0.145 per million input tokens and $3.48 per million output tokens. For context, OpenAI's flagship GPT-5.4 is currently priced roughly 10 to 15 times higher on the output side. If you run AI automation pipelines at enterprise scale, the math is brutal. A workflow that pushes 100 million input tokens and generates 20 million output tokens per day used to cost thousands. On V4-Flash, it costs roughly $19 per day. That is not an incremental optimization — it is a different category of economics.

What is actually new under the hood

DeepSeek introduced a technique it calls Hybrid Attention Architecture. In classic transformers, attention cost scales quadratically with sequence length, which is why 1-million-token contexts have historically been impractical. Hybrid Attention mixes full, sparse, and linear-attention layers in a tuned ratio, keeping quality close to full attention while collapsing compute costs at long sequences. For developers building agentic systems — the kind of long-horizon workflows where a model must remember an entire codebase, a multi-hour conversation, or a full legal document — the 1M context is the feature that actually matters. You do not chunk. You do not RAG. You load context and ask.

The Huawei angle

Here is where the story becomes geopolitical. DeepSeek trained V4 on Huawei's Ascend 950 chips, connected via the new Supernode interconnect that fuses large clusters into effectively one big accelerator. For a decade, the industry assumed frontier models required NVIDIA H100s or B200s. V4 is the clearest public counterexample yet. US export controls intended to slow Chinese AI progress are starting to look more like a motivator than a barrier. When you cannot buy NVIDIA, you build Ascend. When you cannot access CUDA, you optimize for a Chinese software stack. The result is a parallel AI supply chain that the West has almost no visibility into.

What this means for the development industry

I have been watching enterprise teams quietly pilot DeepSeek for months. Most will not admit it publicly because of data-governance concerns. The cost delta is too large to ignore. What V4 does is remove the last remaining excuse — "it is not as smart as GPT" — for most use cases. Three trends accelerate as a result. First, multi-agent architectures become economically viable at smaller companies; when each agent call costs pennies instead of dollars, you can afford three agents debating the right approach before committing to code. Second, long-context reasoning becomes the default; entire repos get loaded into prompts instead of chunked retrieval. Third, the moat around frontier labs shrinks to a ring around safety, alignment, and tooling — because raw capability is commoditizing faster than anyone predicted.

My take

When DeepSeek-R1 landed in January 2025, many treated it as a one-off. A year later, V4 makes it clear — DeepSeek is a sustained R&D engine, not a flash in the pan. The gap between open-source and closed-source has not merely narrowed; on some axes, it has inverted. Open-source models now offer longer contexts, cheaper inference, and fewer licensing restrictions. If you are building AI products in 2026 and you have not benchmarked your workloads on DeepSeek V4, you are making business decisions with stale information. That does not mean ditching OpenAI or Anthropic — it means treating the model layer as a pluggable commodity and building your moat somewhere else: in your data, your workflows, your distribution. The era of betting the next three years on a single model provider is over.