In the last couple of months, a breakthrough has occurred in the world of artificial intelligenceThe homegrown model, DeepSeek, has made waves globally, captivating the attention of experts, developers, and users alikeWhat sets DeepSeek apart is its performance, which stands toe-to-toe with industry giants like OpenAI's models, and its remarkably low inference costsThis competitive edge has drawn notable endorsements from key figures in the tech industry, with Sam Altman, CEO of OpenAI, acknowledging DeepSeek as an impressive model, especially when considering its pricingAdditionally, tech legends such as NVIDIA’s CEO Jensen Huang and Turing Award laureates Yann LeCun and Andrew Ng have expressed their admiration for DeepSeek, further solidifying its credibility.
The surge in interest surrounding DeepSeek has been meteoric; it has rapidly become the fastest application to reach over 30 million daily active usersThe open-source nature of DeepSeek has further fueled the demand for local deployments, prompting various cloud providers and AI hardware manufacturers, including chip companies, to develop corresponding solutions and prepare for the model's implementation.
When it comes to leveraging DeepSeek, users typically have three main avenuesFirst, they can avail themselves of the services offered directly through DeepSeek's official website or its appHowever, the immense user traffic has led to challenges with server capacity, and users often encounter messages stating, "Server is busy, please try again later," severely diminishing the user experience.
The second option involves utilizing the services provided by cloud suppliers like Baidu Cloud, AWS, Alibaba, Tencent Cloud, and Huawei CloudThese prestigious domestic and international cloud vendors have integrated DeepSeek into their offeringsFor instance, as of February 3rd, Baidu’s Smart Cloud officially launched its fully operational versions of DeepSeek’s R1 and V3 models at ultra-low call prices, along with a limited time free trial
Advertisements
On February 16th, Baidu announced the incorporation of DeepSeek and its latest deep search features into Baidu Search and the Wenxin Intelligent Entity platform, allowing users to access these capabilities free of charge.
The final option involves local deployment of DeepSeekThis method differs significantly from the previous two, as deploying DeepSeek on-premises offers utmost privacy protectionIn terms of performance, localized deployment ensures microsecond-level responses, often outperforming many network-based servicesFurthermore, it brings advantages such as greater ease of use and control over the system, as well as lower costs throughout the system's lifecycle, which explains why various enterprises, governments, and even individuals are keen on deploying DeepSeek locally.
However, for developers wishing to deploy DeepSeek locally, selecting the appropriate hardware poses a significant challengeThe competitive landscape has seen numerous AI hardware suppliers, including chip manufacturers, racing to introduce products compatible with DeepSeekLocal chip companies have been particularly aggressive in this domain, eager to offer devices that support the model.
It's important to note that many of the machines currently on offer vary widely in capabilitySome can only run "distilled" versions with fewer parameters due to their limitationsMeanwhile, others can deploy the full-fledged DeepSeek R1 model through multi-machine setupsHence, single-machine solutions capable of supporting the full DeepSeek R1 are especially sought afterSuch single-machine deployments are preferable for their lower costs, enhanced data security, and faster setupsHowever, to achieve this with the full DeepSeek R1, high memory and processing power in the chips are crucial.
Kunlun Chip stands out as one of the few Chinese companies capable of supporting the full deployment of DeepSeek R1 on a single machineFounded from Baidu's Intelligent Chip and Architecture division, Kunlun Chip has a deep and rich background in AI acceleration, having been actively engaged in the field for over a decade.
On February 20, 2025, Kunlun Chip Technologies officially announced that its Kunlun P800 would become the first domestic AI chip supporting the full deployment of DeepSeek V3/R1 6751B models
Advertisements
This landmark achievement marks a significant breakthrough in the domestic AI chip realm.
Kunlun P800, representing cutting-edge domestic AI chip technology, unlocks the full capabilities of DeepSeek R1 through single-machine deploymentAdditionally, it uniquely supports 8-bit inference, delivering precision-loss-free inference services, thus ensuring high efficiency alongside computational accuracy.
According to official sources, the P800 offers configurations of either 8 or 16 cardsThe 8-card configuration alone can achieve a throughput of 2437 tokens per second, placing it at the forefront of the industry concerning performance, power consumption, and deployment flexibility, catering to both lightweight and cost-efficiency demandsThe 16-card version can reach a maximum throughput of 4825 tokens per second.
Given this performance, Kunlun Chip stands tall among its peers within China, largely attributed to its extensive memory design.
Leveraging this leading chip technology, Baidu Cloud illuminated its Kunlun Chip third-generation ten-thousand-card cluster in early FebruaryThis cluster overcame the hardware scalability limitations by deploying a new cooling strategy and optimizing the model through distributed trainingBaidu Cloud had previously launched the Baixiao heterogeneous computing platform in 2021 to enhance the management and deployment efficiency of large computational clusters, which has since upgraded to version 4.0. The current version offers fault tolerance and stability mechanisms to the ten-thousand-card Kunlun cluster, ensuring a training efficacy rate of 98%. To manage inter-machine communication bandwidth needs, Baidu has invested in building a massive High-Performance Network (HPN) to optimize topology and mitigate communication bottlenecks.
With the rising demand for DeepSeek R1/V3, Baidu Cloud’s Qianfan large model platform delivers exceptionally cost-effective API calling services, with prices slashed down to a mere 30% of the official list price from DeepSeek, claiming the title of the lowest price on the internet.
For businesses seeking local deployment, Baidu has developed integrated products that utilize Kunlun P800 within both Baixiao and Qianfan systems
Advertisements
Advertisements
Advertisements