Latest news with #PedroPinhata

ChatGPT beats Grok in AI chess final, Gemini finishes third, Elon Musk says…

Hindustan Times

17 hours ago

Science
Hindustan Times

ChatGPT beats Grok in AI chess final, Gemini finishes third, Elon Musk says…

OpenAI's ChatGPT o3 model defeated Elon Musk's xAI model Grok 4 in the final of a Kaggle-hosted tournament that set out to find the strongest chess-playing large language model (LLM). The event, held over three days, pitted general-purpose LLMs from several companies against each other rather than specialised chess engines. Elon Musk downplayed the defeat, saying Grok's earlier strong results were a 'side effect'.(AP) Tournament format and participants Eight models took part, including entries from OpenAI, xAI, Google, Anthropic and Chinese developers DeepSeek and Moonshot AI. The contest used standard chess rules but tested multi-purpose LLMs, systems that are not specifically optimised for chess play. BBC coverage of the event noted that Google's Gemini finished third after beating another OpenAI entry. Mobile Finder: iPhone 17 Air expected to debut next month Final and key moments Grok 4 led early in the competition but faltered in the final match against o3. Commentators and observers highlighted multiple tactical errors by Grok 4, including repeated queen losses, which swung the match in o3's favour. writer Pedro Pinhata said: 'Up until the semi finals, it seemed like nothing would be able to stop Grok 4,' but added that Grok's play 'collapsed under pressure' on the last day. Grandmaster Hikaru Nakamura, who commentated live, noted: 'Grok made so many mistakes in these games, but OpenAI did not.' Responses and wider context Elon Musk downplayed the defeat, saying Grok's earlier strong results were a 'side effect' and that xAI had 'spent almost no effort on chess.' The result adds a public dimension to the rivalry between Musk's xAI and OpenAI, both founded by people who once worked together at OpenAI. Chess has long been used to measure AI progress. Past milestones include specialised systems such as DeepMind's AlphaGo, which defeated top human players in the game of Go. This Kaggle tournament differs by testing general LLMs on strategic, sequential tasks rather than using dedicated chess engines. What it means The outcome shows variability in how LLMs handle structured, adversarial tasks like chess. While o3's performance suggests some LLMs can sustain strategic play under tournament conditions, Grok 4's collapse illustrates that results may still be inconsistent. Organisers and commentators are likely to continue using chess and similar tasks to probe reasoning, planning and robustness in large language models as the field evolves.

Latest news with #PedroPinhata

ChatGPT beats Grok in AI chess final, Gemini finishes third, Elon Musk says…

Get Started Now: Download the App