In This Article
At the end of January, the little-known Chinese startup DeepSeek found itself in the spotlight of global media. Modest investments of $5.6 million in the development of a new model resulted in a devastating blow to the market — American tech giants collectively lost nearly $1 trillion in capitalization.
The emergence of an accessible alternative to ChatGPT, claiming to be the "Silicon Valley killer," has caused a real stir in the industry.
The Rise of DeepSeek
DeepSeek began its independent journey in May 2023 in Hangzhou, the capital of Zhejiang province. This city is considered China's largest e-commerce hub, home to the headquarters of giants like Alibaba Group, Geely, Hikvision, and Ant Group.
Behind the project is Liang Wenfeng — a businessman and co-founder of the hedge fund High-Flyer, which manages assets worth $8 billion. Founded in 2015, the company has long shown interest in machine learning, investing significant resources in creating its own computing infrastructure as well as research in artificial intelligence. DeepSeek emerged from this structure.
In 2020, High-Flyer introduced the Fire-Flyer I supercomputer, costing 200 million yuan ($27.6 million), specializing in deep learning for AI. A year later, Fire-Flyer II was launched a system costing 1 billion yuan ($138 million), equipped with over 10,000 Nvidia A100 GPUs.
DeepSeek's debut model, released in November 2023, immediately demonstrated performance on par with GPT-4 and was made available for free for researchers and commercial use.
By May 2024, DeepSeek-V2 was launched, with the company's competitive pricing policy forcing even giants like ByteDance, Tencent, Baidu, and Alibaba to lower their prices for AI solutions. As a result, DeepSeek managed to maintain profitability while its competitors incurred losses.
In December 2024, the DeepSeek-V3 model was introduced, outperforming the latest developments from OpenAI and Anthropic in tests. Based on this model, the company created DeepSeek-R1 and its derivatives, which formed the basis of the much-talked-about service.
The main advantage of the new model is its unprecedented low cost of use. For processing one million tokens, DeepSeek charges only $2.19, while OpenAI charges $60 for a similar volume.
Behind the Breakthrough: The Structure of DeepSeek-R1
According to a published study, DeepSeek-R1 is based on reinforcement learning methods and "cold start" techniques. This has allowed it to achieve exceptional performance in areas such as mathematical calculations, programming, and logical reasoning.
A key feature of the model is the Chain of Thought approach that allows complex tasks to be broken down into sequential steps, mimicking human thinking. The system analyzes a task, divides it into stages, and checks each step for errors before forming a final answer.
The technical implementation impresses with its efficiency. DeepSeek-R1 was trained on a system of 2048 Nvidia H800 accelerators, consuming approximately 2.788 million GPU hours. Process optimization is achieved through mixed precision FP8 and Multi-Token Prediction technology, significantly reducing hardware requirements.
The model architecture includes 671 billion parameters; however, uniquely only 37 billion are activated during a single pass. The use of Mixture of Experts ensures scalability without proportional increases in computational costs.An innovative method called Group Relative Policy Optimization (GRPO) deserves special attention.
It allows training models without using critics, significantly enhancing process efficiency.As noted by Jim Fan, senior research manager at Nvidia, this resembles Google DeepMind's AlphaZero breakthrough that learned to play Go and chess "without prior imitation of human grandmaster moves."
He stated that this is "the most important takeaway from the research paper."
A New Approach to Training Language Models
DeepSeek's approach to training is particularly interesting. Unlike other leading LLMs, R1 did not undergo traditional "pre-training" on human-labeled data. Researchers found a way for the model to develop its own reasoning abilities almost from scratch.
The model also represents a new paradigm in AI development: rather than simply scaling up computing power for training, emphasis is placed on how much time and resources the model spends contemplating an answer before generating it. This scaling "computations at test time" distinguishes this new class of "reasoning models" like DeepSeek R1 and OpenAI-o1 from their predecessors.
Telegram's CEO Reaction to the sucess of DeepSeek's Model
In a congratulatory message for the Chinese New Year, Telegram founder Pavel Durov highlighted the success of the buzzworthy AI model DeepSeek and identified the reasons behind such a breakthrough.
According to him, China's education system surpasses that of the West. It encourages fierce competition among students, a principle "borrowed from the highly efficient Soviet model."
In most Western schools, public announcements of grades and student rankings are prohibited to prevent pressure and ridicule. Durov believes such measures demotivate the best students.
As a result, many gifted children find competitive games more engaging than studying—there they see each player's ranking.
Praising students regardless of their performance may seem like a good thing, but reality will shatter this illusion after graduation
Critical Perspective on DeepSeek's Breakthrough
DeepSeek's success raises many questions within the professional community. Scale AI CEO Alexander Wang claims that the company possesses 50,000 Nvidia H100 chips, which directly contradicts U.S. export restrictions.
Given that after restrictions were imposed, the price of smuggled H100s in China soared to $23,000–30,000 each; such a cluster would cost between $1–1.5 billion.
Analysts at Bernstein question the claimed training cost of model V3 at $5.6 million and note a lack of data regarding R1's development expenses. According to Peel Hunt expert Damindu Jayavira, public figures only reflect GPU-hour costs while ignoring other significant expenses.
Political aspects also raise concerns. Founder Liang Wenfeng's participation in a closed symposium chaired by Chinese Premier Li Qiang may indicate a strategic role for the company in overcoming export restrictions and achieving technological independence for China.
It should also be noted that there are built-in censorship mechanisms in R1's API version—especially concerning politically sensitive topics for China. The model refuses to discuss events at Tiananmen Square or human rights issues in China or Taiwan's status replacing generated responses with standard evasive phrases.Concerns about data privacy are also significant.
According to DeepSeek's policy, users' personal information is stored on servers in China which could create problems similar to those faced by TikTok especially acute in the American market where regulators have already shown increased scrutiny toward Chinese tech companies regarding personal data protection.
The Future of Language Models After DeepSeek
Despite controversies surrounding it, DeepSeek’s achievements cannot be underestimated. Testing results show that R1 indeed surpasses American counterparts across many parameters. As Alexander Wang noted: this is "a wake-up call for America," demanding accelerated innovation and enhanced export control over critical components.
While OpenAI still maintains industry leadership for now; DeepSeek’s emergence significantly alters power dynamics within AI models and infrastructure markets. If official figures are accurate; this Chinese company has managed to create a competitive solution with substantially lower costs through innovative approaches and optimization questioning strategies focused solely on increasing computational power adopted by many market players.
Interest in DeepSeek technologies is growing: Meta has already established four "war rooms" to analyze Chinese models aiming to apply acquired knowledge toward developing its open-source Llama ecosystem.
Some experts see DeepSeek’s success not so much as a threat to U.S. technological dominance but rather as an indication of forming a multipolar world in AI development. As former OpenAI policy department employee Miles Brundage stated:
It seems we are witnessing the beginning of a new era in artificial intelligence development where efficiency and optimization may prove more important than sheer computational power.