logo
#

Latest news with #LiveBench

Hey, Google: Prerecorded AI Presentations Are the Coward's Way Out
Hey, Google: Prerecorded AI Presentations Are the Coward's Way Out

Yahoo

time13-05-2025

  • Yahoo

Hey, Google: Prerecorded AI Presentations Are the Coward's Way Out

Just about every big tech event these days includes artificial intelligence updates, often with a slate of live demos -- and sometimes, these demos fail. But some companies are dodging these pitfalls by prerecording their keynote presentations. And I call these moves cowardice. At last year's Made by Google event, Gemini failed twice during a live demonstration. Though moments like this are undoubtedly embarrassing for companies, they add a layer of authenticity you don't get with a prerecorded keynote event. But unfortunately, Google chose the prerecorded route for Tuesday's Android Show: I/O Edition. The format felt way too staged and polished for my liking, and it stripped away the feeling of reality that comes with live, warts-and-all demos. During the Android Show: I/O Edition, we saw a demonstration of Gemini sharing makeup tips, helping someone find a time to grab lunch in their busy schedule, and giving a summary of Jane Austen's Pride and Prejudice. Because these were prerecorded interactions, Gemini handled the requests with aplomb -- no hiccups or issues in sight. But tests show that AI models routinely get things wrong. According to the AI testing site LiveBench, Google's Gemini 2.5 Pro Preview is generally correct about 79% of the time. That's not bad, but it's not great either. And despite that score, this model of Gemini is still one of the best AI models the site tested, losing out to only two other models: OpenAI's o3 High and o4 Medium models. Sure, nothing is perfect, and devices and software have bugs. But if you give me a calculator and promise it works all the time, but in reality it's wrong 20% of the time, that feels like a major discrepancy. Since Gemini outperformed most other AI models LiveBench tested, there's a good chance I'd still use Gemini, even if the live demo stalled. But because Google opted for a superpolished demonstration, I have a hard time knowing what to believe. Look, I understand why a company would want its product to work properly at its own event. But showing AI tools making mistakes feels more honest than acting like the tool is perfect. These capabilities are flawed, and that's fine, but be honest with people about those flaws and show your new features in action. Don't sell me smoke and mirrors. For more on Google, here's what to know about Android 16 and the Material 3 Expressive design.

Alibaba's Qwen3 topples DeepSeek's R1 as world's highest-ranked open-source AI model
Alibaba's Qwen3 topples DeepSeek's R1 as world's highest-ranked open-source AI model

South China Morning Post

time06-05-2025

  • Business
  • South China Morning Post

Alibaba's Qwen3 topples DeepSeek's R1 as world's highest-ranked open-source AI model

Advertisement Data from LiveBench, an independent platform that benchmarks large language models (LLMs) – the technology underpinning generative AI services like ChatGPT – showed that Qwen3 surpassed R1 in tests that gauge open-source AI models' capabilities including coding, maths, data analysis and language instruction. Alibaba owns the South China Morning Post. Hangzhou -based Alibaba's cloud computing unit last week released the Qwen3 family , which consists of eight enhanced models that range from 600 million to 235 billion parameters. In machine learning, parameters are the variables present in an AI system during training, which helps establish how data prompts yield the desired output. Before the latest tests, DeepSeek's R1 had held the world's top open-source AI model spot on the LiveBench platform since its debut in January. Qwen3's ascent in the LiveBench rankings reflects the accelerated pace of development in China's AI sector and Alibaba's growing leadership position in the global open-source community. Advertisement The open-source approach gives public access to a program's source code, allowing third-party software developers to modify or share its design, fix broken links or scale up its capabilities. Open-source technologies have been a huge contributor to China's tech industry over the past few decades.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store