DeepSeek: The $6M AI Behind the Crash

Key Takeaway

A Chinese hedge fund spun off an AI lab, built a reasoning model that matches OpenAI's best work at roughly 1/100th the cost, and accidentally triggered the largest single-day stock market wipeout in financial history. Fifteen months later, DeepSeek is bigger, more controversial, and harder to ignore than ever.

On January 27, 2025, a Monday morning that Wall Street now calls "DeepSeek Monday," Nvidia's stock dropped nearly 18%. Roughly $600 billion in market value evaporated in a single trading session, the largest single-day loss for any company in stock market history. Microsoft fell. Alphabet fell. Broadcom fell. By the end of the week, over $1 trillion had been shaved from American tech stocks.

The cause wasn't a recession, a regulatory crackdown, or a bad earnings report. It was a chatbot. Specifically, it was a reasoning model called DeepSeek-R1, built by a company most Americans had never heard of, headquartered in Hangzhou, China, funded by a hedge fund, and trained for roughly $6 million. That last number is the one that panicked investors. If DeepSeek could match the performance of models that cost hundreds of millions to train, the entire thesis underpinning the AI hardware boom was in trouble.

Fifteen months later, DeepSeek has an 89% market share in China, has captured significant traction across the developing world, and is preparing to release its V4 model. The DeepSeek app hit #1 on the US iOS App Store when it launched, surpassing ChatGPT, and racked up over 23 million downloads in under three weeks.

The company behind the model

DeepSeek isn't a Silicon Valley startup. It's a research lab spun off from High-Flyer, a Chinese quantitative hedge fund. In April 2023, High-Flyer announced it was launching an artificial general intelligence research lab. Two months later, that lab became its own company: DeepSeek.

The co-founder and CEO is Liang Wenfeng, who also co-founded High-Flyer. As of May 2024, Liang personally held an 84% stake in DeepSeek through two shell companies. Chinese state media has called him China's Sam Altman. The comparison is both apt and misleading: like Altman, Liang runs a company that produces frontier AI models. Unlike Altman, Liang has explicitly said DeepSeek focuses on research and has no immediate plans for commercialization.

The hardware situation draws the most geopolitical attention. DeepSeek built its models using Nvidia H800 GPUs, chips designed specifically for the Chinese market after the US banned exports of the more powerful H100 and A100 chips in late 2022. In a September 2025 Nature paper, DeepSeek acknowledged it also owns A100 chips used for early-stage experiments. US officials have claimed DeepSeek had access to H100s acquired after export controls took effect. The full picture remains murky.

How DeepSeek builds AI differently

The $6 million training cost for DeepSeek-V3 is the number that broke investors' brains. For context, OpenAI's CEO Sam Altman said in 2023 that training GPT-4 cost "much more" than $100 million. The idea that a Chinese lab could produce a model of similar capability for roughly 1/100th of the cost felt impossible.

The secret isn't magic; it's architecture. DeepSeek uses a technique called Mixture-of-Experts (MoE), where the model has 671 billion total parameters but only activates about 37 billion for any given task. Think of it like a hospital with hundreds of specialists: when a patient arrives with a broken arm, you don't need the cardiologist, the neurologist, and the dermatologist. You send them to the orthopedist. MoE works the same way, routing each input to the small subset of parameters most relevant to that specific problem.

DeepSeek paired MoE with Multi-head Latent Attention (MLA), which reduces the memory needed to process long sequences of text. The result is a model that's dramatically cheaper to train and faster to run.

DeepSeek-R1, the reasoning model, was trained for $294,000, using 512 H800 chips for about 80 hours (per a peer-reviewed Nature paper). On performance, R1 scored 91.6% on the MATH-500 benchmark, surpassing OpenAI's o1 at 85.5%. Independent labs verified these results.

What DeepSeek offers right now

DeepSeek-V3 is the general-purpose workhorse. It handles conversation, coding, analysis, and creative tasks competitively with GPT-4-class models.

DeepSeek-R1 is the reasoning specialist. It excels at step-by-step problem solving, mathematical proofs, and algorithmic logic.

DeepSeek-VL2 handles multimodal tasks (text and images). Competitive on vision-language benchmarks but the least mature offering.

DeepSeek-V4 is the upcoming flagship, expected spring 2026. Reportedly ~1 trillion parameters with only ~37 billion active per token, a 1 million token context window, and native multimodal capabilities including text, image, and video generation.

Pricing is DeepSeek's other killer feature. API access runs roughly $0.28 per million input tokens and $0.42 per million output tokens, about 1/27th the cost of comparable access from OpenAI or Anthropic.

All models are released under open-weight licenses (MIT for R1, Apache 2.0 expected for V4). This is the single biggest structural difference between DeepSeek and its American competitors. OpenAI's models are fully proprietary. Anthropic's models are fully proprietary. DeepSeek publishes the weights for free For more, see Every AI Tool Worth Paying For in 2026..

The problems you should actually worry about

Censorship is built in. DeepSeek's models refuse to engage with topics sensitive to the Chinese government. Ask about Tiananmen Square, Taiwan's political status, or the Uyghur situation, and the model will deflect or give a response aligned with CCP positions. The open-weight releases allow developers to remove some guardrails, but the base model carries ideological alignment baked into its training.

Security has been a consistent concern. Palo Alto Networks found it's relatively easy to bypass DeepSeek's safety guardrails. Enkrypt AI reported R1 is four times more likely to produce malware or insecure code than OpenAI's o1. Stanford's Freeman Spogli Institute noted that without a thorough code audit, hidden telemetry can't be ruled out.

The distillation controversy is unresolved. In February 2026, Anthropic accused DeepSeek of using thousands of fraudulent accounts to generate millions of conversations with Claude to train its own models. OpenAI made similar accusations. DeepSeek defended distillation as a standard technique. The AI community's response was mixed: many noted that Anthropic, OpenAI, and most of their peers are themselves defendants in copyright and training data lawsuits.

Data privacy is a genuine risk for casual users. If you use the DeepSeek app directly, your conversations are processed on servers subject to Chinese law. For developers running open-weight models on their own infrastructure, this concern largely disappears.

Who should use DeepSeek (and who shouldn't)

Developers and businesses who want to run AI models on their own servers, at dramatically lower cost, should absolutely evaluate DeepSeek's open-weight models. The ability to download R1 or V3, host it locally, and pay nothing for inference is a genuine competitive advantage.

Researchers and students working on math, coding, or reasoning tasks get exceptional performance for free. R1's mathematical reasoning is legitimately best-in-class on several benchmarks For more, see Deepfakes Cost $1..

Casual users should think carefully. The censorship limitations and data privacy situation with the app make American and European alternatives more trustworthy for general-purpose chatbot use.

Anyone handling sensitive business data should not use the DeepSeek app or API directly. Run the open-weight models locally, or don't use DeepSeek at all.

Why DeepSeek matters beyond DeepSeek

DeepSeek proved you don't need $100 million and 10,000 of the most expensive GPUs on earth to build a frontier AI model. You need smart architecture, efficient training techniques, and willingness to publish your work.

That realization forced every major AI lab to rethink its strategy. Meta doubled down on open-source with Llama 4. OpenAI eventually published an open-source model. The entire industry shifted from "pre-training scale" toward "inference-time scale." DeepSeek didn't just build a cheaper model. It invalidated the assumption that expensive models were inherently better.

For ordinary people, the story boils down to three things. AI is getting dramatically cheaper. Open-weight models are here to stay. And the geopolitics of AI are real and messy. Use DeepSeek where it makes sense. Understand the tradeoffs where it doesn't. And pay attention when V4 drops.