DeepSeek V4 Will Run Entirely on Huawei Chips. The R2 Failure That Made It Possible.

DeepSeek V4 Will Run Entirely on Huawei Chips. The R2 Failure That Made It Possible.
DeepSeek V4 Will Run Entirely on Huawei Chips. The R2 Failure That Made It Possible.
~1T
Total Parameters
37B
Active per Token
$0.14
/M Input Tokens
0
NVIDIA GPUs Required

Reuters confirmed on April 4, 2026, that DeepSeek’s next flagship model will run entirely on Huawei’s Ascend chips. Not NVIDIA. Not AMD. Huawei. The roughly one-trillion-parameter V4 is the first frontier AI model built from the ground up for Chinese silicon, and it arrives after months of quiet engineering that most coverage has ignored: a failed training run on Huawei hardware, a forced retreat to NVIDIA, and a second attempt that appears to have worked.

Alibaba, ByteDance, and Tencent have pre-ordered hundreds of thousands of Huawei Ascend 950PR chips to serve V4 through their cloud platforms. The demand pushed chip prices up 20% in weeks. DeepSeek deliberately withheld early model access from NVIDIA and AMD, giving that window exclusively to Chinese chip manufacturers. The release, expected in the last two weeks of April, will test whether the U.S. semiconductor export strategy can survive contact with architectural cleverness.

The Architecture That Makes Weaker Hardware Viable

DeepSeek V4 uses the same Mixture-of-Experts (MoE) design that made V3 surprisingly efficient, but scaled dramatically. The model contains approximately one trillion total parameters, organized into 256 expert sub-networks plus one shared expert. On any given token, only about 37 billion parameters activate. The routing mechanism selects the top eight experts per token, which means V4 processes each input like a 37B model while drawing on the knowledge encoded across one trillion parameters.

This sparsity is what makes Huawei hardware viable. A dense one-trillion-parameter model would require compute that the Ascend 910C cannot deliver competitively. But when 96% of the model sits idle on any given forward pass, the performance gap between Ascend and NVIDIA’s H100 shrinks from disqualifying to manageable. DeepSeek’s engineers are compensating for slower individual chips through software optimization rather than brute-force hardware performance.

Beyond the MoE scaling, V4 introduces Engram, a conditional memory system described in a January 2026 paper. Traditional transformers compress all learned knowledge into neural network weights and re-derive relationships through attention computation on every pass. Engram breaks that assumption. It adds a lookup-based memory layer that stores static factual knowledge separately. The model calls on expensive neural processing only for novel reasoning. Consider the phrase \”New York City.\” A standard transformer has to learn that those three tokens form a specific entity, then rebuild that relationship every single time. Engram stores it once and retrieves it for free. DeepSeek’s internal tests show this pushed Needle-in-a-Haystack retrieval accuracy from 84% to 97% across the full one-million-token context window.

The context window itself jumped from 128K to one million tokens. At that scale, the KV cache memory problem dominates inference cost. DeepSeek’s Multi-Head Latent Attention (MLA), introduced in V2 and refined through V3, compresses key-value information into smaller representations. Combined with Engram, V4 can process roughly 800 pages of text in a single pass without the memory explosion that would make a dense architecture impossible on Ascend hardware.

V4 also adds native multimodal input, accepting text, images, and code within the same context window. No image or video quality benchmarks exist yet. The multimodal capability appears secondary to V4’s real design target: long-context coding and software engineering.

The R2 Failure That Preceded This

The V4-on-Huawei story reads differently when you know that DeepSeek already tried this with R2 and it did not work.

According to the Financial Times, DeepSeek initially attempted to train its R2 reasoning model on Huawei’s Ascend 910C chips. The training runs failed repeatedly. The problems were not individual hardware defects. They were systemic gaps in Huawei’s software stack. CANN (Compute Architecture for Neural Networks), Huawei’s answer to NVIDIA’s CUDA, lacked the maturity required for distributed training across thousands of interconnected chips. Inter-chip communication latency caused synchronization failures. Memory consistency errors corrupted training progress. Completed training steps were lost and had to be rerun.

Huawei dispatched senior engineers to DeepSeek’s training center to troubleshoot on-site. The problems persisted. DeepSeek ultimately abandoned Huawei hardware for R2 training and reverted to NVIDIA GPUs, relegating Ascend chips to inference-only duties. The delays pushed R2’s timeline back by months.

What changed between R2’s failure and V4’s planned launch? DeepSeek spent Q1 2026 collaborating with Huawei and Cambricon Technologies to rewrite core model code for CANN compatibility. The engineers did not just port existing code. They reimplemented components of MLA and the expert routing system to account for the performance characteristics of Ascend hardware. This is not an optimization pass. It is a full re-architecture that treats Huawei silicon as the primary target rather than a fallback.

The Ascend 950PR, the chip at the center of V4’s deployment, reportedly delivers approximately 2.8 times the compute of NVIDIA’s H20 (the restricted chip China can still import), though it falls short of the H200. DeepSeek’s bet is that the 950PR combined with V4’s sparse architecture and custom software will close the remaining gap.

What the Export Controls Were Supposed to Prevent

The strategic logic of U.S. semiconductor export restrictions assumed that cutting China off from NVIDIA’s top-tier GPUs would slow frontier AI development. The assumption had a specific dependency chain: frontier models require frontier hardware, and frontier hardware requires TSMC fabrication that Huawei cannot access for its most advanced designs.

DeepSeek V4 breaks two links in that chain simultaneously. The MoE architecture reduces the raw compute needed per token by approximately 96%, making frontier-class models trainable on hardware that would be insufficient for dense architectures. And the deliberate exclusion of NVIDIA from early optimization access signals that DeepSeek is building its entire software stack around a supply chain that U.S. policy cannot reach.

IDC estimates that Chinese chipmakers captured 41% of China’s AI accelerator market in 2025. Alibaba, ByteDance, and Tencent ordering hundreds of thousands of Ascend 950PR chips converts that market share into infrastructure. If V4 delivers on its benchmark claims (80%+ SWE-bench Verified, 90% HumanEval, competitive with Claude Opus 4.6 and GPT-5.4), the result is a complete parallel AI stack: Chinese models trained on Chinese chips, optimized for Chinese cloud infrastructure, available at roughly 20 to 50 times lower cost than Western alternatives.

NVIDIA halted China-bound H200 production in early March 2026 and shifted TSMC capacity allocation to its next-generation Vera Rubin architecture. The move acknowledges that China revenue, which peaked at $5.5 billion annualized before export restrictions, is structurally gone. The substitute demand from U.S. hyperscalers is already capacity-constrained. When DeepSeek released V3 in late 2024, it erased $589 billion from NVIDIA’s market cap in a single trading session. V4 on Huawei hardware extends that pressure from a stock-market shock to a structural question about NVIDIA’s long-term addressable market.

What Has Not Been Verified

DeepSeek’s benchmark claims for V4 come from internal tests only. No independent evaluation has confirmed the 80%+ SWE-bench or 90% HumanEval numbers. DeepSeek’s V3 benchmarks largely held up under third-party scrutiny, but V4’s architecture is different enough that prior credibility does not transfer automatically.

The multimodal capabilities have no public benchmarks at all. DeepSeek’s image and video generation quality is unknown. The Financial Times described V4 as having \”picture, video and text-generating functions,\” but no reviewer has tested them.

The Ascend 950PR’s real-world training and inference performance at scale remains undisclosed. Huawei’s claim of 2.8x the H20 is a spec-sheet number. As the TurboQuant episode demonstrated, spec-sheet numbers and production performance can diverge sharply when software hits real hardware. The R2 training failure on earlier Ascend hardware is a concrete reminder that CANN’s maturity remains the binding constraint.

V4 has been delayed twice already. The February and March release windows both passed. V4-Lite appeared on DeepSeek’s website on March 9 with reported 30% faster inference and 94% context recall at 128K tokens (up from 45%), which suggests incremental rollout rather than a single launch event. The \”last two weeks of April\” timeline is the best current estimate, but treat it with appropriate uncertainty.

What Happens When V4 Drops

If V4 matches its claimed performance while running exclusively on Chinese silicon, the consequences ripple in multiple directions at once.

The open-source cost floor drops again. V4 will almost certainly ship under Apache 2.0 or MIT, consistent with DeepSeek’s prior models. Projected API pricing of $0.14 per million input tokens is roughly 100x cheaper than Claude Opus 4.6. For developers outside both the U.S. and China, this creates a genuine choice that did not exist 18 months ago: open-weight, downloadable, consumer-hardware-friendly models that compete with closed frontier systems on actual benchmarks.

Meanwhile, the global AI hardware market splits in two. A U.S.-centric stack built on NVIDIA, CUDA, and the big three cloud providers increasingly serves different models and different customers than a China-centric stack built on Huawei Ascend, CANN, and Alibaba Cloud. Developers building for both markets will need to test on both hardware ecosystems. Nobody wins from that fragmentation except the companies selling shovels on each side.

And the $297 billion that flowed into AI in Q1 2026 looks different if the price of frontier inference drops by another order of magnitude. Companies paying $15 to $30 per million output tokens for GPT-5.4 should benchmark V4 before their next contract renewal. The question has moved past whether Chinese open-source models can compete on quality. The question now is whether Western closed models can justify their pricing when the open alternative runs on hardware that no export control can touch.

The AI chip race is no longer about who makes the fastest chip. It is about who can make a fast-enough chip and pair it with architecture clever enough to close the gap. DeepSeek’s bet is that sparsity beats silicon. The next two weeks will show whether that bet holds.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading