
Baidu the latest to join open-source movement with Ernie 4.5 models publicly available
Advertisement
Baidu open-sourced 10 variants from its Ernie 4.5 multimodal model family, from the 0.3 billion parameter lightweight models to the heavyweight 424 billion parameter ones, according to a statement.
Beijing-based Baidu, one of the earliest tech firms in China to develop large language models (LLMs) following the release of ChatGPT in November 2022, has made a U-turn by making its models open-source. A year ago, founder and CEO Robin Li Yanhong was publicly saying its Ernie series, like OpenAI's ChatGPT models, would be more powerful than open-source ones.
However, the release of open-source models by Chinese start-up DeepSeek, which took the AI world by storm at the start of this year, triggered an accelerated shift to open-source by China's Big Tech firms. For example,
the Qwen models developed by Alibaba Group Holding are the world's most popular open-source models among developers. Alibaba owns the South China Morning Post.
The logo of Baidu's Ernie Bot is displayed near a screen showing the Baidu logo, in this illustration picture taken June 28, 2023. Photo: Reuters
Citing a range of benchmark tests that value an AI system's general and domain knowledge, coding and maths skills, as well as reasoning capabilities, Baidu said that its 300B Ernie 4.5 model outperformed DeepSeek's V3, which was twice the size of the Ernie model.
Advertisement
The benchmark results showcase the progress Baidu has made in improving its models in recent months, after the company announced earlier this year it would shift to an open source approach. The move followed Hangzhou-based DeepSeek's emergence into the global spotlight with its open-source V3 and R1 models that were built cost-efficiently for high-performance tasks.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


South China Morning Post
5 days ago
- South China Morning Post
As AI chatbots gain popularity in China, so does the business of inserting ads into results
As ChatGPT-like chatbots gain traction, generative engine optimisation (GEO) has emerged as an increasingly significant approach for brands seeking visibility in answers generated by artificial intelligence. Unlike traditional search engine optimisation (SEO) that is meant to improve rankings in results, GEO aims to ensure content is cited, summarised or recommended by AI models, which generate 'answers with a word limit', said Yuan Yong, branding director of Big Fish Marketing in Shenzhen. Yuan, who has been working with SEO since 2014, only ventured into GEO this year after Chinese AI start-up DeepSeek burst onto the scene, prompting many local companies to rethink their strategies. Optimising AI-generated content involves improving the information provided on a company's website, thereby increasing its exposure to news outlets, online portals and Wikipedia-like platforms, Yuan said. Yuan cited the example of online education platform Nuoyun, which received a recommendation from DeepSeek as one of the best such resources in China after the Big Fish team 'polished' the content on its website, as well as posting 40 articles on other websites. But that does not mean the new-found visibility stayed permanent, as continued attention would be required to fine-tune the content.


South China Morning Post
6 days ago
- South China Morning Post
OpenAI poised to release GPT-5 model days after launching 2 new open systems
San Francisco-based OpenAI hinted as much in an all-caps post on X that read: 'LIVE5TREAM THURSDAY 10 AM PT.' While there appeared to be a typo in the first word of that announcement, many of those who posted comments got the hint that the misplaced '5' could be about GPT-5. In a subsequent post on his personal X account, CEO Sam Altman did not address the speculation on GPT-5's release, but said the live-streaming session would be 'longer than usual, around an hour'. 'We have a lot to show and hope you can find the time to watch!' he added. The potential launch this week of GPT-5 – the latest iteration of the technology behind generative AI chatbot ChatGPT – would reflect a strong effort by OpenAI to finally release the product upgrade in the global market of large language models (LLMs), considering that GPT-4 was released in March 2023. In the meantime, open-source models from Chinese start-ups like DeepSeek and MoonshotAI , along with those from mainland Big Tech firms led by Alibaba Group Holding , have seen increased adoption in the industry on the back of their low-cost appeal and innovative features. Alibaba owns the South China Morning Post.


AllAfrica
7 days ago
- AllAfrica
US senators sound alarm on DeepSeek's security risks
Seven Republican US senators have called for an investigation into DeepSeek's data security threats, citing growing concerns that the artificial intelligence (AI) model could leak personal data or generate harmful content. In a letter submitted to US Commerce Secretary Howard Lutnick, the lawmakers urged the government to evaluate the risks of Chinese AI models collecting and sending data to servers in China. After DeepSeek released its R1 model in late January, Wiz Research found a publicly accessible database belonging to the Chinese AI model. It said the database contained a significant volume of chat history, backend data and sensitive information, including log streams, API Secrets and operational details. The senators also said that R1 probably did not undergo comprehensive red-teaming and safety tests to prevent the generation of harmful content. 'A Wall Street Journal reporter was able to get R1 to write text for a social media campaign intended to encourage self-harm amongst teenage girls, as well as to provide instructions for carrying out a bioweapon attack,' they said. They requested the US Commerce Department to: explaining how it will use resources like the Center for AI Standards and Innovation (CAISI) to work with relevant agencies to protect US businesses and citizens; investigate the national security risks posed by Chinese open-source AI models; Identify any evidence of these models providing US data to the Chinese People's Liberation Army (PLA) or associated companies. In March, Chinese media reported that the PLA was using DeepSeek in its hospitals, the People's Armed Police (PAP), and national defense mobilization units. Ren Hao, a senior software engineer at 301 Hospital, stated that the hospital deployed DeepSeek-R1 on Huawei's Ascend hardware to create a local knowledge database. The PLA's Central Theatre Command General Hospital also said it used DeepSeek's R1-70B AI model to assist doctors by suggesting treatment plans. Apart from these, Chinese academics said the home-made large language model (LLM) can be deployed for military use. Fu Yanfang, a researcher at Xian Technological University's School of Computer Science and Engineering, said in May that her team used DeepSeek's AI models to generate military simulation scenarios. She said a commander has to spend 48 hours planning for a military scenario, but a self-developed AI-based simulator can generate 10,000 military scenarios in just 48 seconds. 'LLMs and combat simulation scenarios had redefined the future of war design,' she said, adding that DeepSeek's LLM can easily deconstruct and reconstruct complex battlefield situations through training on massive data sets. A white paper published by Chongqing Landship Information Technology, an autonomous driving solution provider, also said that DeepSeek has excellent potential for military use, particularly in command, communications, and intelligence, surveillance, and reconnaissance (ISR) applications. 'China can deploy DeepSeek V3 in Gongji-11 drones to fight against F-16V fighter jets in the Taiwan Strait,' Wen Chang, a senior researcher at Techxcope, a Beijing-based think tank, says in an article published in March. 'This would be a fairer game than deploying China's sixth-generation (Chengdu J-36) or fifth-generation (Chengdu J-20) fighter jets to combat the F-16V, which is not a stealth fighter.' 'Although the F-16V also has an AI system, it still needs the pilot to make most decisions. In this sense, Gongji-11 has an advantage as it can fly 24 hours a day,' he added. On February 6, two US representatives, Democrat Josh Gottheimer and Republican Darin LaHood, introduced the bipartisan 'No DeepSeek on Government Devices Act,' which prohibits federal employees from using DeepSeek on government-issued devices. Only New York, Texas, and Virginia have banned DeepSeek on government devices. The US Navy has also prohibited the use of the AI model. In March, Reuters reported that the US Commerce Department's bureaus informed their staff members that DeepSeek is banned on their government devices. It's unclear whether the Trump administration would seek to ban the deployment of DeepSeek entirely in the US. US President Donald Trump said on February 8 that the release of DeepSeek may be beneficial for the US, as AI technologies will be significantly less expensive than initially thought. The seven senators requested that Lutnick report any findings on how Chinese AI models may have illegally accessed US technology, such as export-controlled semiconductors. The US banned the shipments of the A100 and H100 to China in October 2022 and the A800 and H100 to the country in October 2023. In late January, Lutnick testified before the US Senate in a hearing that DeepSeek could create its AI models 'dirt cheap' because it was able to purchase a large quantity of Nvidia chips and access data from Meta's open platform. DeepSeek claimed that the training cost of R1 was only US$5.58 million, which is 1.1% of Meta's US$500 million for Llama 3.1. It claimed it trained the model using the distilled data from Alibaba's Qwen and Meta's Llama. Alexandr Wang, chief executive of the US-based Scale AI, told CNBC that DeepSeek and its parent, High Flyer, could have accumulated 50,000 units of Nvidia's high-end AI chips, such as the H100. An unnamed senior State Department official told Reuters in late June that DeepSeek used Southeast Asian shell companies to obtain high-end Nvidia chips. In February, Singapore charged three men with fraud for allegedly helping ship Nvidia's high-end chips to DeepSeek in China in 2024. The trio, including two Singaporeans and one Chinese national, was accused of shipping servers with the A100 and H100 to Malaysia and potentially elsewhere. In late March, Malaysia's Minister of Investment, Trade, and Industry, Tengku Zafrul Aziz, said the US had requested the Malaysian government to monitor every shipment of Nvidia chips arriving in Malaysia. On July 14, Malaysia announced that companies must apply for permits to re-export high-performance American AI chips. On Wednesday, the US Department of Justice said two Chinese nationals, both 28, have been arrested in California for violating the US Export Control Reform Act as they exported from the US to China sensitive technology, including graphic processing units (GPUs) – specialized computer parts used for modern computing – without first obtaining the required license or authorization from the US. Read: US plans to tighten AI chip export rules for Malaysia, Thailand