In the last couple of months, a breakthrough has occurred in the world of artificial intelligence. The homegrown model, DeepSeek, has made waves globally, captivating the attention of experts, developers, and users alike. What sets DeepSeek apart is its performance, which stands toe-to-toe with industry giants like OpenAI's models, and its remarkably low inference costs. This competitive edge has drawn notable endorsements from key figures in the tech industry, with Sam Altman, CEO of OpenAI, acknowledging DeepSeek as an impressive model, especially when considering its pricing. Additionally, tech legends such as NVIDIA’s CEO Jensen Huang and Turing Award laureates Yann LeCun and Andrew Ng have expressed their admiration for DeepSeek, further solidifying its credibility.
The surge in interest surrounding DeepSeek has been meteoric; it has rapidly become the fastest application to reach over 30 million daily active users. The open-source nature of DeepSeek has further fueled the demand for local deployments, prompting various cloud providers and AI hardware manufacturers, including chip companies, to develop corresponding solutions and prepare for the model's implementation.
When it comes to leveraging DeepSeek, users typically have three main avenues. First, they can avail themselves of the services offered directly through DeepSeek's official website or its app. However, the immense user traffic has led to challenges with server capacity, and users often encounter messages stating, "Server is busy, please try again later," severely diminishing the user experience.
The second option involves utilizing the services provided by cloud suppliers like Baidu Cloud, AWS, Alibaba, Tencent Cloud, and Huawei Cloud. These prestigious domestic and international cloud vendors have integrated DeepSeek into their offerings. For instance, as of February 3rd, Baidu’s Smart Cloud officially launched its fully operational versions of DeepSeek’s R1 and V3 models at ultra-low call prices, along with a limited time free trial. On February 16th, Baidu announced the incorporation of DeepSeek and its latest deep search features into Baidu Search and the Wenxin Intelligent Entity platform, allowing users to access these capabilities free of charge.
The final option involves local deployment of DeepSeek. This method differs significantly from the previous two, as deploying DeepSeek on-premises offers utmost privacy protection. In terms of performance, localized deployment ensures microsecond-level responses, often outperforming many network-based services. Furthermore, it brings advantages such as greater ease of use and control over the system, as well as lower costs throughout the system's lifecycle, which explains why various enterprises, governments, and even individuals are keen on deploying DeepSeek locally.
However, for developers wishing to deploy DeepSeek locally, selecting the appropriate hardware poses a significant challenge. The competitive landscape has seen numerous AI hardware suppliers, including chip manufacturers, racing to introduce products compatible with DeepSeek. Local chip companies have been particularly aggressive in this domain, eager to offer devices that support the model.

It's important to note that many of the machines currently on offer vary widely in capability. Some can only run "distilled" versions with fewer parameters due to their limitations. Meanwhile, others can deploy the full-fledged DeepSeek R1 model through multi-machine setups. Hence, single-machine solutions capable of supporting the full DeepSeek R1 are especially sought after. Such single-machine deployments are preferable for their lower costs, enhanced data security, and faster setups. However, to achieve this with the full DeepSeek R1, high memory and processing power in the chips are crucial.
Kunlun Chip stands out as one of the few Chinese companies capable of supporting the full deployment of DeepSeek R1 on a single machine. Founded from Baidu's Intelligent Chip and Architecture division, Kunlun Chip has a deep and rich background in AI acceleration, having been actively engaged in the field for over a decade.
On February 20, 2025, Kunlun Chip Technologies officially announced that its Kunlun P800 would become the first domestic AI chip supporting the full deployment of DeepSeek V3/R1 6751B models. This landmark achievement marks a significant breakthrough in the domestic AI chip realm.
Kunlun P800, representing cutting-edge domestic AI chip technology, unlocks the full capabilities of DeepSeek R1 through single-machine deployment. Additionally, it uniquely supports 8-bit inference, delivering precision-loss-free inference services, thus ensuring high efficiency alongside computational accuracy.
According to official sources, the P800 offers configurations of either 8 or 16 cards. The 8-card configuration alone can achieve a throughput of 2437 tokens per second, placing it at the forefront of the industry concerning performance, power consumption, and deployment flexibility, catering to both lightweight and cost-efficiency demands. The 16-card version can reach a maximum throughput of 4825 tokens per second.
Given this performance, Kunlun Chip stands tall among its peers within China, largely attributed to its extensive memory design.
Leveraging this leading chip technology, Baidu Cloud illuminated its Kunlun Chip third-generation ten-thousand-card cluster in early February. This cluster overcame the hardware scalability limitations by deploying a new cooling strategy and optimizing the model through distributed training. Baidu Cloud had previously launched the Baixiao heterogeneous computing platform in 2021 to enhance the management and deployment efficiency of large computational clusters, which has since upgraded to version 4.0. The current version offers fault tolerance and stability mechanisms to the ten-thousand-card Kunlun cluster, ensuring a training efficacy rate of 98%. To manage inter-machine communication bandwidth needs, Baidu has invested in building a massive High-Performance Network (HPN) to optimize topology and mitigate communication bottlenecks.
With the rising demand for DeepSeek R1/V3, Baidu Cloud’s Qianfan large model platform delivers exceptionally cost-effective API calling services, with prices slashed down to a mere 30% of the official list price from DeepSeek, claiming the title of the lowest price on the internet.
For businesses seeking local deployment, Baidu has developed integrated products that utilize Kunlun P800 within both Baixiao and Qianfan systems. These products enable quick one-click deployment of the entire range of DeepSeek R1/V3 models in a single-machine environment, offering an out-of-the-box user experience. Among these, Baixiao DeepSeek integrated machines are particularly notable for meeting high-performance training and inference requirements.
These machines boast high throughput, rapid data processing capabilities, and can support concurrent usage for teams of 500, all while maintaining low latency with response times averaging under 50 milliseconds. Their operational costs can be cut by up to 80%, making them a budget-friendly option in China. The time from unpacking to powering on to service activation can be as quick as half a day, making it remarkably efficient.
Reflecting on the remarkable journey of DeepSeek over the past month, it is evident that its success hinges on the meticulous balance of performance and cost, coupled with a flourishing open-source ecosystem. As the only AI chip capable of supporting the full deployment of DeepSeek R1, Kunlun sets new industry standards with its high-performance solutions that combine processing power, ample memory, and affordability while providing immediate usability through Baixiao and Qianfan integrated systems.
Moreover, DeepSeek’s ascendance has opened up new possibilities for domestic chips such as Kunlun. Looking to the future, the synergy between local hardware and software will pave a more controllable developmental path for local large models. This intricate dance of innovation and collaboration promises a bright horizon ahead.