Let's cut straight to the chase. When people ask "what chips were used to train DeepSeek?", they're not just looking for a shopping list. They want to understand the strategic decisions, the astronomical costs, and the performance trade-offs that went into building one of the world's leading large language models. The answer is a mix of industry standards, geopolitical necessities, and a glimpse into the future of AI compute. The primary hardware backbone for training models like DeepSeek-V2 and its predecessors relied heavily on NVIDIA's H100 and A100 GPUs. But that's only half the story. To ensure supply chain resilience and potentially lower long-term costs, DeepSeek's creators, DeepSeek AI, also integrated a significant number of Huawei's Ascend 910B AI accelerators. This hybrid approach reveals more about the state of AI hardware than any spec sheet alone.
What You'll Discover
The Exact Chip Lineup: NVIDIA's Dominance and Huawei's Challenge
You can't talk about modern AI training without mentioning NVIDIA. It's like talking about cars without mentioning engines. For DeepSeek's training clusters, the workhorse was the NVIDIA H100 Tensor Core GPU. Think of this as the gold standard, the V12 engine of AI. Before the H100 was widely available, its predecessor, the NVIDIA A100, did the heavy lifting for earlier model iterations.
But here's where it gets interesting. Sitting right beside those NVIDIA cards in the server racks were Huawei Ascend 910B processors. This isn't a backup plan; it's a deliberate, parallel strategy. The 910B is China's most advanced commercially available AI chip, developed by Huawei's HiSilicon unit specifically to compete in this space. While exact ratios are a closely guarded secret, industry reports from analysts like SemiAnalysis suggest that for large-scale training runs, DeepSeek AI likely operated a heterogeneous cluster, using both types of chips, possibly for different stages of training or to hedge against supply constraints.
The Big Picture: This two-vendor strategy isn't about loyalty; it's about pragmatism. Relying solely on NVIDIA exposes you to geopolitical export controls and sky-high prices. Using only domestic alternatives (at this stage) might mean sacrificing some efficiency or ecosystem maturity. DeepSeek's chip selection is a masterclass in balancing performance, risk, and cost.
Why This Specific Mix of Chips? Strategy Over Specs
If NVIDIA's chips are so good, why bother with Huawei's? This is the core of the "why" behind the "what." It boils down to three non-negotiable factors: supply chain security, cost control, and software sovereignty.
1. The Geopolitical Insurance Policy
US export controls on advanced AI chips to China are a real and present danger for Chinese AI companies. A training run for a model like DeepSeek can take months. Imagine being 80% through, and your next shipment of H100s gets held up indefinitely. It's a multi-million dollar nightmare. Integrating the Ascend 910B into their software stack and infrastructure acts as a crucial insurance policy. It ensures that even in a worst-case scenario, training can continue, albeit potentially at a slower pace. This isn't paranoia; it's basic business continuity planning for a multi-billion dollar industry.
2. The Long-Term Cost Equation
Everyone focuses on the sticker price of an H100 (around $30,000-$40,000). Fewer talk about the total cost of ownership. The Ascend 910B is competitively priced, but the real potential savings come from breaking NVIDIA's near-monopoly. Having a viable second source gives DeepSeek AI negotiating power. More competition in the supplier base is the only thing that can eventually bring down the extortionate cost of AI compute. They're investing in the Huawei ecosystem now to cultivate that competition for the future.
3. Avoiding Complete Software Lock-In
NVIDIA's CUDA platform is brilliant. It's also a walled garden. By building expertise with Huawei's CANN (Compute Architecture for Neural Networks) software stack, DeepSeek's engineers ensure they aren't completely captive to one company's roadmap. This diversification of technical skill is a strategic asset. It allows them to optimize for different hardware and potentially pioneer techniques that work best on hybrid systems.
The Staggering Cost Breakdown: Where the Money Really Goes
Let's talk numbers. Saying "training is expensive" is an understatement. It's like saying Mount Everest is a tall hill. The chip purchase is just the entry fee.
Training a frontier model like DeepSeek-V2 likely required tens of thousands of GPUs running continuously for months. A cluster of 10,000 H100s doesn't just cost $300-$400 million to buy. You need to power them, cool them, and house them. The electricity bill alone is mind-boggling. A single H100 server can draw over 10 kilowatts. Scale that up, and you're looking at power consumption rivaling a small town.
| Cost Component | Estimate for a Large-Scale Run | Key Insight |
|---|---|---|
| Hardware Acquisition (H100/A100 + 910B) | $200M - $500M+ | Capital expenditure, but chips can be reused for future models or rented out. |
| Data Center & Power (3-6 months) | $50M - $150M | Often the hidden giant. Cooling is as critical as electricity. |
| Engineering & Optimization Labor | Tens of Millions | Cost of top ML engineers and systems architects to keep the cluster running at peak efficiency. |
| Software & Licensing | Significant but variable | Costs for cluster management software, NVIDIA software licenses, etc. |
This is why the choice of chip matters so much. A 10-15% difference in training efficiency (how fast a chip completes a task) translates directly into millions of dollars saved or wasted on electricity and engineering time. The decision between an H100 and a 910B isn't just about which is faster on a benchmark; it's a complex calculation of upfront cost, operational cost, reliability, and strategic risk.
Performance vs. Hype: The On-Paper Reality
Benchmarks are useful, but they're run in clean labs, not in massive, noisy clusters running for months. Here's a more grounded comparison based on public specs and industry consensus.
NVIDIA H100 (SXM5 variant): This is the undisputed king for FP8 and FP16 precision training, which is the bread and butter of LLMs. Its Tensor Cores and dedicated Transformer Engine are specifically designed to crush this workload. Its memory bandwidth (over 3 TB/s with HBM3) is insane. The biggest advantage, though, is the software. CUDA, libraries like cuDNN, and the entire ecosystem are mature, which means less time debugging and more time training.
Huawei Ascend 910B: On paper, its raw FP16 performance is competitive. Where it historically lagged was in the software stack and support for lower precision formats like FP8, which are critical for speed and efficiency in modern training. The CANN software has improved dramatically, but it still requires more manual tuning and expertise than NVIDIA's relatively plug-and-play ecosystem. Its real-world performance in a massive cluster is highly dependent on how well the DeepSeek team has optimized their code for it.
A common misconception is to view this as a simple head-to-head race. In practice, within a hybrid cluster, different chips might be assigned different tasks. For example, more stable, later stages of training or certain types of data processing might be routed to the Ascend chips, while the most computationally intensive core training loops run on the H100s. This orchestration is itself a major engineering feat.
Future Trends: What's Next for AI Training Chips?
Looking at DeepSeek's hardware choices today tells us where the puck is going tomorrow.
1. The Rise of Custom Silicon (ASICs): The ultimate endgame for giants like Google (TPU), Amazon (Trainium), and maybe even DeepSeek in the future, is designing their own chips. This offers the maximum performance-per-watt and cost control. The barrier is the immense R&D cost. For now, using a mix of top-tier commercial chips from NVIDIA and Huawei is a smarter move for DeepSeek, but the temptation to go custom will grow with each billion-dollar training run.
2. Memory, Not Just Compute, is the Bottleneck: Future chips will focus as much on memory bandwidth and capacity as on raw FLOPs. Training larger models means shuffling unimaginable amounts of data. Chips with faster memory (like HBM3e) or innovative architectures that reduce data movement will have a huge edge. This is an area where all vendors are racing.
3. Sustainability Becomes a Selling Point: The environmental cost of AI training is under scrutiny. The next generation of chips will compete on performance-per-watt as much as pure performance. A chip that is 20% slower but uses 40% less power could be the more economical and politically palatable choice for running a 6-month training job.
Your Burning Questions Answered
Could DeepSeek have been trained using only Huawei Ascend chips?
Technically, yes. The compute capability exists. Practically, it would have been a massive gamble. The training would almost certainly have taken longer, consumed more energy, and required a heroic level of low-level software optimization. The hybrid approach de-risked the project. They used the proven, efficient NVIDIA hardware to guarantee progress while simultaneously building competency and validating the Huawei stack as a viable partner. It was a "belt and suspenders" strategy.
What's the single biggest mistake people make when estimating AI training chip needs?
They focus solely on peak theoretical FLOPs. That number is almost meaningless. The real metrics are utilization and scaling efficiency. Can you keep 10,000 chips fed with data and working in harmony without bottlenecks? A cluster running at 50% utilization on a slightly slower chip is often cheaper and faster than a cluster running at 30% utilization on the "fastest" chip. The interconnect between chips (like NVIDIA's NVLink) is often more important than the chip itself for large-scale training.
How do chip choices affect the final DeepSeek model I interact with?
Not directly in its intelligence, but potentially in its availability and cost. The staggering expense of training on this hardware is a primary reason why advanced AI models are built by well-funded companies or governments. It creates a huge barrier to entry. The chip strategy influences how quickly new versions can be developed and how much it costs to run the model (inference), which could eventually trickle down to the price you pay for access.
Is the reliance on NVIDIA a temporary phase in AI development?
It's a phase, but "temporary" might mean another 5-10 years. NVIDIA's lead in software (CUDA) is a moat that's harder to cross than its hardware lead. Every AI researcher and engineer learned on CUDA. Every major library is built for it. This ecosystem lock-in is powerful. However, the economic and geopolitical pressures are creating strong incentives for alternatives. The market will diversify, but NVIDIA will remain the dominant player for the foreseeable future of training. Companies like DeepSeek are actively funding that diversification with their purchasing decisions.
So, what chips were used to train DeepSeek? The answer is a strategic portfolio: NVIDIA H100/A100 for proven performance and efficiency, and Huawei Ascend 910B for supply chain resilience and future leverage. This wasn't a random selection; it was a calculated move in the high-stakes game of AI supremacy. The choice of silicon is no longer just an engineering decision—it's a core business and geopolitical strategy that determines who can afford to play in the era of giant AI models.
Share Your Thoughts