On January 27, 2025, Nvidia lost $589 billion in market capitalization in a single trading session — the largest one-day destruction of shareholder value in stock market history. The trigger was not an earnings miss or a product failure. It was the release of DeepSeek R1, an open-source reasoning model from a Hangzhou-based AI lab that most people outside China had never heard of.
The math was brutal for Nvidia shareholders. DeepSeek had built a model competitive with OpenAI's o1 — which required tens of thousands of Nvidia's most expensive GPUs to train — for roughly $6 million in compute costs. If frontier AI could be built this cheaply, how many $30,000 H100s did the world actually need?
President Trump called DeepSeek a "wake-up call." The world's 500 wealthiest people lost a combined $108 billion that day. And a 40-year-old former quantitative hedge fund manager named Liang Wenfeng became the most discussed figure in artificial intelligence overnight.
This is the story of how DeepSeek went from a side project of a Chinese quant fund to the company that fundamentally altered the economics of the global AI industry. It covers the founder's background, the technical innovations that made it possible, the strategic choices that amplified its impact, and what it all means for the AI competition between China and the United States.
From Quant Trading to Frontier AI
DeepSeek's origin story is unlike any other major AI lab, and understanding it explains much about the company's unusual approach to AI development.
Liang Wenfeng graduated from Zhejiang University — one of China's top engineering institutions — where he studied machine vision. In 2015, he and two university classmates co-founded High-Flyer (幻方量化), a quantitative hedge fund that used machine learning for algorithmic stock trading. High-Flyer grew to manage roughly $8 billion in assets at its peak and reportedly achieved a 56% annual return in a strong year — making it one of China's most successful quant funds.
The fund's competitive edge was computing infrastructure. High-Flyer had built massive GPU clusters for its trading models, giving Liang deep expertise in large-scale training and inference at a time when most of the AI world was still running experiments on modest hardware. The firm was one of the first Chinese quant funds to invest heavily in GPU computing, building data centers that rivaled those of major tech companies. This infrastructure bet gave High-Flyer a speed advantage in executing trades — and gave Liang an intuitive understanding of how to squeeze maximum performance from limited computing resources.
According to Bloomberg reporting, Liang's insight was that the same infrastructure and talent could be redirected from predicting stock prices to building frontier AI models. The transition from quant trading to AI research was not as jarring as it sounds. Both disciplines involve training large neural networks on massive datasets, optimizing for performance under hardware constraints, and iterating rapidly based on measurable outcomes. The quantitative mindset — rigorously empirical, data-driven, skeptical of narratives — translates directly to AI research.
DeepSeek was formally founded in May 2023 as an independent entity, spun out of High-Flyer's AI research division. The company's stated mission — "making AGI a reality" — echoes every other AI lab, but its funding structure was different. Rather than raising venture capital to burn on compute, DeepSeek was initially bankrolled by High-Flyer's trading profits. This gave Liang something most AI founders lack: patient capital with no quarterly board meetings demanding revenue.
That changed in May 2026, when DeepSeek raised its first external funding — $3 to 4 billion at a $45 to 50 billion valuation, with Tencent among the backers. The valuation had surged from roughly $20 billion just weeks earlier, reflecting the market's reassessment of DeepSeek's strategic importance after the V4 launch.
The Technical Breakthrough: Doing More With Less
DeepSeek's core technical contribution is not a better model. It is a better way to build models. Two architectural innovations define the DeepSeek approach, and both are responses to a specific constraint: limited access to the most advanced AI chips.
Mixture-of-Experts (MoE)
DeepSeek V3, released in December 2024, uses a Mixture-of-Experts architecture with 671 billion total parameters but only 37 billion active per token. In plain terms: the model contains 671 billion "knowledge units," but for any given question, it only uses about 5.5% of them. A routing mechanism dynamically selects which experts are relevant to each input.
The cost savings are dramatic. Traditional dense models activate every parameter for every query, which means compute scales linearly with model size. DeepSeek's MoE design reduces training costs by roughly 90% compared to a dense model of equivalent total parameters. The full V3 training run consumed only 2.788 million H800 GPU hours, at a reported cost of $5.576 million. A subsequent peer-reviewed Nature publication co-authored by Liang revealed that the R1 reasoning model's final training run cost just $294,000 using 512 H800 chips — though this figure excludes the foundational V3 model that R1 was built on top of.
Multi-Head Latent Attention (MLA)
The second innovation addresses memory bandwidth — the bottleneck that constrains inference speed and cost. Traditional transformer models use Multi-Head Attention, which requires storing large key-value caches for every token in a sequence. This becomes expensive at scale, especially for long-context tasks.
DeepSeek's Multi-Head Latent Attention compresses the key-value cache into a low-rank latent representation, dramatically reducing memory requirements without significant performance loss. Combined with FP8 mixed-precision training (using 8-bit floating point instead of the standard 16-bit or 32-bit), this allowed DeepSeek to train larger models on less capable hardware.
The constraint produced the innovation. US export controls had restricted DeepSeek's access to Nvidia's H100 GPUs, limiting the company to the H800 — a version with reduced interconnect bandwidth. Rather than treating this as a fatal disadvantage, DeepSeek designed architectures that compensated for hardware limitations through algorithmic efficiency. As one analysis noted, the software was shaped around the hardware's constraints.
The result was a model that did not just match GPT-4 on benchmarks — it matched it using a fundamentally different approach to compute. Where Western labs threw more GPUs at the problem, DeepSeek found ways to use fewer GPUs more intelligently. The V3 technical paper, published on arXiv, became one of the most-cited AI papers of 2025 precisely because it demonstrated that the brute-force scaling paradigm was not the only path to frontier performance.
The Model Lineup
DeepSeek has released models at a pace that matches or exceeds any Western lab. Here is the complete timeline of major releases:
| Model | Release | Parameters | Key Innovation | Training Cost |
|---|---|---|---|---|
| DeepSeek V2 | May 2024 | 236B MoE | First MLA architecture | Not disclosed |
| DeepSeek V3 | Dec 2024 | 671B MoE (37B active) | MoE + MLA + FP8 | ~$5.6M |
| DeepSeek R1 | Jan 2025 | 671B MoE | Open-source reasoning, RL-trained | ~$6M |
| DeepSeek V3.2 | Early 2026 | Improved V3 | Efficiency gains | Not disclosed |
| DeepSeek V4 Flash | Apr 2026 | 284B MoE | Lightweight, fast inference | Not disclosed |
| DeepSeek V4 Pro | Apr 2026 | 1.6T MoE (49B active) | Huawei Ascend support, 1M context | Not disclosed |
Interestingly, the market reaction to V4 was notably muted compared to the V3/R1 "black swan." As Omdia's chief analyst observed: "This announcement followed a rather predictable path." The AI market had normalized Chinese innovation — the expectation that capable models would emerge from China was now "baked into valuations."
Performance: Where DeepSeek Stands
Benchmarks are imperfect but useful. Here is how DeepSeek's models compare against the leading Western alternatives on widely-cited tests:
| Benchmark | DeepSeek R1 | OpenAI o1 | GPT-4 | Notes |
|---|---|---|---|---|
| MATH-500 | 97.3% | 96.4% | — | Mathematical reasoning |
| HumanEval (Code) | 82.6% | — | 80.5% | Code generation |
| GPQA Diamond | Comparable | Comparable | — | Graduate-level science |
| SWE-bench Verified | 81.0% (V4 Pro) | — | — | Real-world coding tasks |
DeepSeek V4 Pro's BenchLM composite reasoning score of 87 exceeds Gemini 3.1 Pro's 77.1 and leads all Chinese models. It ranks #4 out of 115 models in coding and programming benchmarks with an average score of 90.5.
The picture that emerges is near-parity on mathematical reasoning and coding tasks, with Western models maintaining advantages in creative writing, multimodal capabilities, and complex multi-turn dialogue. For the high-volume, high-value applications that dominate enterprise AI — code generation, data analysis, mathematical computation — DeepSeek is genuinely competitive.
The Open Source Strategy
DeepSeek's most consequential decision was not technical. It was strategic: releasing every major model as open weights.
R1's open-source release on Hugging Face and GitHub in January 2025 was not charity. It accomplished three things simultaneously:
First, it established DeepSeek as the standard-bearer for open AI development. While OpenAI (despite its name), Anthropic, and Google have progressively closed their models, DeepSeek went the opposite direction. The developer community responded immediately. Hugging Face even launched an "Open-R1" project to fully reproduce DeepSeek's training pipeline in the open — a remarkable vote of confidence from the Western open-source community.
Second, it commoditized the reasoning model category. By releasing a model competitive with OpenAI's o1 for free, DeepSeek undermined the pricing power of closed-source providers. If comparable reasoning capability is available at zero marginal cost, the justification for paying premium API rates weakens considerably. This aligns with what our analysis of the china-ai-token-price-war documented: Chinese models charge 10-35x less than American equivalents, and open-source releases accelerate this price compression.
Third, it exposed the cost structure of frontier AI. DeepSeek's transparency about R1's training cost — under $6 million in compute — shattered the narrative that frontier AI required hundreds of millions in investment. The immediate market consequence was the $589 billion Nvidia selloff: if models this capable can be built this cheaply, how many of Nvidia's GPUs do you actually need?
Hugging Face published a dedicated analysis titled "One Year Since the DeepSeek Moment," examining how the release reshaped the global open-source AI ecosystem. The impact was measurable: China's open-source models — primarily Qwen and DeepSeek — now account for roughly 30% of global AI usage and about 15% of the global AI market, up from roughly 1% just a year ago.
The Economics: Cost Comparison
The pricing gap between DeepSeek and Western providers is not incremental. It is structural.
| Provider | Model | Input ($/M tokens) | Output ($/M tokens) | Cached ($/M tokens) |
|---|---|---|---|---|
| DeepSeek | V4 Flash | $0.14 | $0.28 | $0.028 |
| DeepSeek | V4 Pro | $0.435 | $0.87 | $0.028 |
| OpenAI | GPT-5.4 | ~$10.00 | ~$15.00 | $0.50-$2.00 |
| Anthropic | Claude-class | ~$3.00 | ~$15.00 | — |
The output token gap is the most striking: DeepSeek V4 Pro charges $0.87 per million output tokens versus GPT-5.4's $15.00 — a 17x difference. For a company processing 100 million output tokens per month, that is the difference between $87 and $1,500. For Chinese enterprises operating on thin margins in competitive industries — manufacturers running quality inspection models, logistics companies optimizing delivery routes, small businesses deploying customer service chatbots — this pricing makes AI adoption economically viable where it would not be at Western prices.
There is a legitimate critique of the headline $5.6 million training cost. As researchers at INSAIT have noted, the figure does not account for the years of prior research, the infrastructure built during High-Flyer's operations, the cost of failed experiments, or the salaries of DeepSeek's research team. The true all-in cost of developing V3 was almost certainly orders of magnitude higher. But this critique, while valid, misses the point. The market reacted to the marginal training cost because that is what determines whether the next model can be built cheaply. If DeepSeek can replicate the process for $6 million of compute per training run, the sunk costs of prior research do not change the economics of future model development.
DeepSeek's V4 Pro is currently offered at 75% discount, extended through May 31, 2026 — down from $1.75/$3.50 at launch to $0.435/$0.87. This aggressive pricing is consistent with the broader Chinese AI strategy documented in our china-ai-token-price-war analysis: use low inference prices as a loss-leader to pull developers onto cloud platforms.
The Geopolitical Dimension
DeepSeek's story cannot be separated from the US-China technology competition. The company's trajectory directly challenges the theory behind American export controls.
Since October 2022, the United States has progressively tightened restrictions on advanced AI chip exports to China. The stated goal: deny China access to the compute infrastructure required for frontier model training, slowing AI development. The theory is straightforward — if capability scales with compute, and you restrict compute, you restrict capability.
DeepSeek has exposed a flaw in this theory. R1 was trained on Nvidia H800s — the export-restricted version with reduced interconnect bandwidth. DeepSeek compensated with algorithmic innovations, particularly MLA and FP8 mixed-precision training, that extracted more capability from constrained hardware. V4 took the next step by training on domestic Huawei Ascend chips entirely.
As Ankura China Advisors noted in Reuters: "The 'wow factor' was last year — that's already priced in. What matters now is whether China can continue advancing on AI development, and potentially do so with its own chips — the geopolitical implications would be significant."
The irony is sharp. The export controls that were supposed to slow Chinese AI development may have accelerated it by forcing Chinese labs to innovate around hardware constraints. DeepSeek's efficiency gains — MoE, MLA, FP8 training — were born from the necessity of doing more with less. Without the H100 restrictions, there would have been less pressure to develop these techniques. The US government, in effect, subsidized DeepSeek's innovation by denying it access to the easy path of brute-force scaling.
The White House memo released on April 23, 2026, accusing China of "industrial-scale theft" of AI intellectual property, underscores the shifting US posture. If Chinese AI labs were simply copying Western techniques on smuggled Nvidia chips, export controls would be effective. The fact that the policy debate has shifted to accusations of IP theft suggests that computational containment is not working as intended.
The deeper question is whether DeepSeek's efficiency innovations are replicable or represent a unique advantage. Our analysis of open-source-chinese-ai suggests the answer is both: the MoE and MLA techniques are published and reproducible (Hugging Face's Open-R1 project proves this), but DeepSeek's particular expertise in squeezing performance from constrained hardware reflects a deep institutional competence built through years of quant trading on similar constraints.
DeepSeek in the Chinese AI Ecosystem
DeepSeek operates within an AI ecosystem that has reached industrial scale. China's daily AI token consumption hit 140 trillion in March 2026 — a 1,000-fold increase from early 2024. ByteDance's Doubao AI assistant alone processes 120 trillion tokens daily.
But DeepSeek's position within this ecosystem is changing. The muted reaction to V4 reflected not just market normalization but also intensified domestic competition. Moonshot AI's Kimi 2.6 and Alibaba's Qwen have narrowed the gap with DeepSeek. As our chinese-ai-models-compared analysis documents, the Chinese AI landscape is now defined by multiple capable providers rather than a single leader.
The competitive dynamics are fierce. On OpenRouter, the open inference marketplace, Chinese models now account for over 60% of tracked token volume. Five of the top 10 models by traffic are Chinese. Tencent's Hy.3 topped the list with 3.74 trillion tokens per week, followed by Kimi K2.6 at 1.78 trillion.
DeepSeek's differentiation lies in its open-source commitment and its demonstrated ability to achieve frontier performance on domestic chips. These are not easily replicated advantages. Alibaba, by contrast, released three proprietary models in April 2026 accessible only through its cloud platform — the opposite strategic bet.
What DeepSeek Reveals About the AI Race
Three structural implications emerge from DeepSeek's story:
Efficiency can substitute for scale. The dominant assumption in Western AI development has been that capability scales with compute budget. OpenAI, Google, and Anthropic have each pursued larger models trained on larger clusters, with training runs costing hundreds of millions. DeepSeek demonstrated that algorithmic innovation — MoE, MLA, FP8 training — can deliver comparable results at 1/50th to 1/100th the cost. This does not mean scale is irrelevant. It means scale is not the only path to frontier performance.
Open source is a competitive weapon, not an act of charity. DeepSeek's open-source strategy mirrors the "commoditize your complement" playbook that Microsoft used against Netscape and Google used against Apple with Android. By releasing frontier models for free, DeepSeek erodes Western pricing power, builds a global developer community, and creates switching costs that no marketing budget can buy. Every product built on DeepSeek is a product not built on OpenAI.
Export controls accelerate domestic innovation. The US chip restrictions were designed to slow Chinese AI development. Instead, they forced Chinese labs to develop more efficient architectures and to accelerate domestic chip deployment. DeepSeek V4 running on Huawei Ascend chips is not just a technical achievement — it is proof that computational containment, at least in its current form, may be counterproductive. The constraint DeepSeek faced produced the innovation that gave it an advantage.
None of this means China has achieved AI superiority. The United States still leads in frontier model capability, semiconductor manufacturing, and the cloud infrastructure that supports large-scale AI deployment. US AI companies generated $22 billion in revenue in 2025 versus China's $1.8 billion — a 12:1 ratio. Anthropic's Mythos model, available exclusively through Project Glasswing to JPMorgan, Amazon, and Microsoft, operates at capability levels Chinese models have not matched.
But the gap is narrowing faster than most analysts predicted. And DeepSeek is the primary reason why.
The Path Ahead
DeepSeek's next challenges are different from the ones it has already solved.
Scaling without Nvidia. V4 proved that frontier models can be trained on Huawei Ascend chips. But the Ascend 910C delivers roughly 60-70% of the H100's training performance in synthetic benchmarks. Closing this gap — or continuing to compensate architecturally — will determine whether DeepSeek can maintain its trajectory as models grow larger.
Commercializing without compromising. The $3-4 billion funding round at a $50 billion valuation comes with expectations. DeepSeek's API pricing is aggressive but not the most aggressive in the market. Finding a path to profitability without closing its open-source models — the strategy that Alibaba has chosen — will test Liang's commitment to openness.
Competing with domestic rivals. DeepSeek's brief position as China's clear AI leader is over. Kimi, Qwen, Doubao, and others have caught up. As Omdia noted, the AI market has normalized Chinese innovation. Being first is no longer DeepSeek's primary advantage — being the most efficient and the most open is.
Hallucination and reliability gaps. The quality gap between Chinese and American models remains real. According to Semi Fundamental analysis, Chinese frontier models hallucinate at rates 3 to 5 times higher than US frontier models — 3-5% versus under 1% on factual tasks. For consumer chatbots and content generation, this is tolerable. For enterprise applications in healthcare, legal, or financial services, it is a dealbreaker. DeepSeek has not publicly addressed this gap, and closing it may require fundamentally different approaches to training data curation and output verification.
The distillation clock. Several analyses have noted that Chinese AI labs benefited from training on outputs from Western frontier models — a technique called distillation. This shortcut has a limited shelf life. As Western providers implement countermeasures — rate limiting, output watermarking, API restrictions on bulk querying — the distillation advantage erodes. Semi Fundamental estimates this approach has roughly a 6-month shelf life before countermeasures make it ineffective. DeepSeek's continued competitiveness will depend on whether its native research capabilities can sustain progress without this crutch.
The question that matters most is not whether DeepSeek can beat OpenAI. It is whether the competitive dynamics of the AI industry — where winner-take-all economics and massive compute requirements favored a handful of well-funded Western companies — still hold when your competitor can build a frontier model for $6 million and run it on chips you cannot block.
Frequently Asked Questions
What is DeepSeek?
DeepSeek is a Chinese AI research lab founded in 2023 by Liang Wenfeng, a former quantitative hedge fund manager. Based in Hangzhou, the company has released a series of open-source large language models — including V3, R1, and V4 — that match or exceed the performance of models from OpenAI and Google at a fraction of the training cost. DeepSeek was valued at $45-50 billion in its first external funding round in May 2026.
How does DeepSeek compare to OpenAI?
On mathematical reasoning (MATH-500), DeepSeek R1 scores 97.3% versus OpenAI o1's 96.4%. On code generation (HumanEval), DeepSeek achieves 82.6% versus GPT-4's 80.5%. DeepSeek V4 Pro scores 80.6% on SWE-bench (real-world coding), 87% cheaper than GPT-5.4. OpenAI retains advantages in creative writing, multimodal tasks, and complex multi-turn dialogue. The API pricing gap is 17-35x in DeepSeek's favor.
How did DeepSeek train models so cheaply?
Two key innovations: Mixture-of-Experts (MoE) architecture that activates only 37 billion of 671 billion total parameters per query, and Multi-Head Latent Attention (MLA) that compresses memory requirements. Combined with FP8 mixed-precision training, these techniques reduced V3's training cost to $5.6 million — roughly 90% less than a dense model of equivalent size. The constraint of having limited access to Nvidia's best chips drove these efficiency innovations.
Is DeepSeek open source?
Yes. All of DeepSeek's major models — V3, R1, V3.2, V4, and V4 Pro — have been released as open weights on Hugging Face and GitHub. Developers can download, fine-tune, and deploy these models locally without paying API fees. The open-source strategy has made DeepSeek the standard-bearer for open AI development, with Hugging Face launching a dedicated project to reproduce DeepSeek's training pipeline.
Who owns DeepSeek?
DeepSeek was founded by Liang Wenfeng, who previously co-founded High-Flyer, one of China's largest quantitative hedge funds. The company was initially funded by High-Flyer's trading profits. In May 2026, DeepSeek raised $3-4 billion in its first external funding round at a $45-50 billion valuation, with Tencent among the backers. Liang remains CEO.
Why did DeepSeek cause Nvidia to lose $589 billion?
On January 27, 2025, DeepSeek released R1, an open-source reasoning model competitive with OpenAI's o1 but trained for roughly $6 million. This challenged the market assumption that frontier AI requires massive compute investment — and therefore massive GPU purchases from Nvidia. Nvidia shares fell ~17% that day, erasing $589 billion in market cap — the largest single-day loss for any company in stock market history. The selloff reflected investor concern that if AI could be built cheaply, demand for Nvidia's expensive GPUs would decline.
By China Made & Tech Team. Independent publication covering Chinese manufacturing and technology innovation for global audiences.
Related Entries
- china-ai-robotics-guide — China's AI and Robotics Industry: From Labs to Factory Floors
- chinese-ai-models-compared — Chinese AI Models Compared: DeepSeek vs Qwen vs Yi vs Baichuan
- deepseek-v4-launch-analysis — DeepSeek V4 Changes Everything
- china-ai-token-usage-scale — China's 140 Trillion Daily AI Tokens
- china-ai-token-price-war — China Sells AI at $0.10 While OpenAI Charges $3
- open-source-chinese-ai — Open Source Chinese AI: What Developers Need to Know
- china-ai-chip-design — China's AI Chip Design: Huawei Ascend, Biren, and Moore Threads