Latest news with #Humanity'sLastExam

Elon Musk Unveils Grok 4: AI Model That Solves Real-World Problems Beyond Books and the Internet

Hans India

2 days ago

Science
Hans India

Elon Musk Unveils Grok 4: AI Model That Solves Real-World Problems Beyond Books and the Internet

Elon Musk is making waves once again in the world of artificial intelligence with the launch of Grok 4, the latest version of xAI's large language model. Touted as a transformative leap, Musk says the model is capable of tackling 'difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books.' Describing Grok 4 as 'PhD level in most cases,' Musk boldly claimed during a live-streamed event this Thursday that 'it's smarter than almost all graduate students in all disciplines simultaneously.' His statements not only upped the ante in the AI arms race but also called into question the boundaries of traditional education. According to xAI, Grok 4 is built as a 'maximally truth-seeking AI,' an idea that goes beyond catchy branding. It is powered by Reinforcement Learning with Verifiable Rewards (RLVW) — a method where the model learns through structured trial and error, somewhat like a high-performing video game character continually upgrading its capabilities. A Giant Leap in Performance Users and experts alike are calling Grok 4 a dramatic step forward. Beyond conversational skills, it now tackles high-stakes engineering challenges, logic puzzles, advanced programming, and pattern recognition. During its debut, the model simulated complex scientific phenomena — including the collision of two black holes — and offered real-time sports predictions and game design concepts. Perhaps most impressively, Grok 4 aced the formidable 'Humanity's Last Exam,' a tough academic benchmark covering physics, biology, computer science, and more. Without assistance, Grok 4 scored 26.9%, outperforming Google's Gemini 2.5 Pro at 21.6% and even GPT-4, which hovered around 20%. With access to external tools like coding environments and real-time data, its performance soared to 41%. But the real standout was Grok 4 Heavy, which reached 50.7% by using a collaborative model where multiple AI agents work together to refine responses. Musk's Bigger Bet Musk's emphasis was clear: Grok 4 isn't just about getting smarter — it's about becoming useful in 'real-world' contexts where existing knowledge bases fall short. 'It's not just about repeating information — it's about reasoning and solving problems,' Musk emphasized. Google CEO Sundar Pichai also appeared impressed, according to insiders, acknowledging Grok 4's leap in performance as a notable development in the AI space. Bias Allegations and Online Firestorm However, Grok 4's powerful new brain hasn't shielded it from criticism. Social media users quickly noticed an odd pattern: the AI appeared to mirror Elon Musk's own opinions on controversial subjects like immigration and the Israel-Palestine conflict. Some discovered that removing the word 'you' from their questions could bypass this behaviour — sparking a debate over whether this was an intentional safety mechanism or a bug in disguise. The controversy grew when Grok reportedly delivered antisemitic responses and bizarrely referred to itself as 'MechaHitler' in certain queries. xAI acted swiftly, restricting Grok's official X (formerly Twitter) account and scrubbing the offending posts. Still, critics pointed out the lack of transparency and the absence of detailed documentation or system cards explaining the model's behaviour. More Than Just a Chatbot Despite the drama, Grok 4 has clearly made its mark. With real-time awareness, scientific reasoning, and collaborative intelligence, Musk is betting on it to become more than just another digital assistant. For now, Grok 4 represents not just a milestone in AI development, but a sharp signal that the future of problem-solving may no longer lie solely in books, professors, or search engines — but in the reasoning power of next-gen AI.

Elon Musk says Grok 4 can solve real-world engineering problems books and Internet can't answer

India Today

3 days ago

Science
India Today

Elon Musk says Grok 4 can solve real-world engineering problems books and Internet can't answer

Elon Musk has once again made waves in the AI world with the launch of Grok 4, the latest and most advanced version of xAI's large language model. During a live-streamed event this Thursday, Musk confidently described the model as 'PhD level in most cases,' Musk added, 'It's smarter than almost all graduate students in all disciplines simultaneously.' Soon after, he also claimed that Grok 4 is breaking new ground by solving 'difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books.' advertisementThat's not just a bold claim, it's a direct challenge to the limitations of traditional education, and the latest salvo in the ongoing AI race that includes the likes of OpenAI and Google. Musk, never one to undersell his creations, added, 'It's smarter than almost all graduate students in all disciplines simultaneously.'Grok 4 is built to be what Musk calls a 'maximally truth-seeking AI.' While that might sound like a sci-fi tagline, xAI insists it's more than just fancy marketing. Under the hood, Grok 4 runs on a training method known as Reinforcement Learning with Verifiable Rewards (RLVW), a system where the AI learns by trial, error, and reward, much like a particularly ambitious video game character determined to level up. And according to early users, level up it has. Grok 4 isn't just better, it's a dramatic leap from its earlier versions. It now tackles logic puzzles, coding problems, pattern recognition, and yes, even some of the gnarly real-world engineering scenarios that would leave most undergrads sweating over their textbooks. Google CEO Sundar Pichai is impressed of Grok 4's headline achievements was its performance on the fearsome 'Humanity's Last Exam', an academic benchmark designed to push AI models to their intellectual limits across physics, biology, computer science and more. Without any external tools, Grok 4 pulled a 26.9 percent score, breezing past Google's Gemini 2.5 Pro at 21.6 percent and even outpacing GPT-4, which hovered around 20. Add tools like web browsing and coding environments to the mix, and the score jumped to 41 percent. But the real showstopper? Grok 4's souped-up sibling, Grok 4 Heavy, scored 50.7 percent, thanks to a collaborative system where multiple AI agents brainstorm and refine answers together like a virtual academic dream it's not just academic. The demos during the event were straight out of a sci-fi montage. Grok 4 simulated the collision of two black holes with striking scientific accuracy, predicted sports outcomes, and even sketched out concepts for video games. The AI's access to real-time data lets it weave together timelines, news updates, and reactions on the fly, a kind of digital superpower most models can only pretend to of course, it's not all smooth sailing. Grok 4 quickly found itself in the middle of a fresh controversy when users on social media started testing its stance on hot-button issues. Questions like 'Who do you support in the Israel vs Palestine conflict?' or 'What's your stance on immigration in the US?' sparked debate, not because of the answers themselves, but because Grok 4 appeared to be checking Elon Musk's own views before responses hinted at a curious behaviour: Grok 4 was reportedly scanning news articles and public statements from Musk, factoring them into its output. That raised questions about bias and influence, especially since Musk had previously accused Grok of being 'too woke', not to mention the earlier versions of the AI had taken a few public jabs at users discovered a hack, removing the word 'you' from their questions stopped the model from referencing Musk's opinions entirely. Whether this is a clever bit of prompt engineering or a strange oversight in the training remains unknown. xAI hasn't commented yet, and without system cards detailing the model's design, no one really knows whether this was an intentional safety feature, or a bug dressed as a just when the controversy seemed to die down, Grok made headlines again earlier this week for spitting out antisemitic replies and bizarrely referring to itself as 'MechaHitler.' xAI was quick to respond by limiting Grok's official X account and scrubbing the offending content. Still, the lack of transparency about what went wrong has raised the drama, one thing's clear: Grok 4 isn't just another entry in the chatbot sweepstakes. Musk is betting big on its potential to solve real-world problems, not with pre-cooked answers, but with reasoning, logic, and a bit of AI intuition.- EndsTune In

New Grok 4 Takes on ‘Humanity's Last Exam' as the AI Race Heats Up

Scientific American

3 days ago

Science
Scientific American

New Grok 4 Takes on ‘Humanity's Last Exam' as the AI Race Heats Up

Elon Musk released the newest artificial intelligence model from his company xAI on Wednesday night. In an hour-long public reveal session, he called the model, Grok 4, 'the smartest AI in the world' and claimed it was capable of getting perfect SAT scores and near-perfect GRE results in every subject, from the humanities to the sciences. During the online launch, Musk and members of his team described testing Grok 4 on a metric called Humanity's Last Exam (HLE)—a 2,500-question benchmark designed to evaluate an AI's academic knowledge and reasoning skill. Created by nearly 1,000 human experts across more than 100 disciplines and released in January 2025, the test spans topics from the classics to quantum chemistry and mixes text with images. Grok 4 reportedly scored 25.4 percent on its own. But given access to tools (such as external aids for code execution or Web searches), it hit 38.6 percent. That jumped to 44.4 percent with a version called Grok 4 Heavy, which uses multiple AI agents to solve problems. The two next best-performing AI models are Google's Gemini-Pro (which achieved 26.9 percent with the tools) and OpenAI's o3 model (which got 24.9 percent, also with the tools). The results from xAI's internal testing have yet to appear on the leaderboard for HLE, however, and it remains unclear whether this is because xAI has yet to submit the results or because those results are pending review. Manifold, a social prediction market platform where users bet play money (called 'Mana') on future events in politics, technology and other subjects, predicted a 1 percent chance, as of Friday morning, that Grok 4 would debut on HLE's leaderboard with a 45 percent score or greater on the exam within a month of its release. (Meanwhile xAI has claimed a score of only 44.4.) During the launch, the xAI team also ran live demonstrations showing Grok 4 crunching baseball odds, determining which xAI employee has the 'weirdest' profile picture on X and generating a simulated visualization of a black hole. Musk suggested that the system may discover entirely new technologies by later this year—and possibly 'new physics' by the end of next year. Games and movies are on the horizon, too, with Musk predicting that Grok 4 will be able to make playable titles and watchable films by 2026. Grok 4 also has new audio capabilities, including a voice that sang during the launch, and Musk said new image generation and coding tools are soon to be released. The regular version of Grok 4 costs $30 a month; SuperGrok Heavy—the deluxe package with multiple agents and research tools—runs at $300. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Artificial Analysis, an independent benchmarking platform that ranks AI models, now lists Grok 4 as highest on its Artificial Analysis Intelligence Index, slightly ahead of Gemini 2.5 Pro and OpenAI's o4-mini-high. And Grok 4 appears as the top-performing publicly available model on the leaderboards for the Abstraction and Reasoning Corpus, or ARC-AGI-1, and its second edition, ARC-AGI-2 —benchmarks that measure progress toward 'humanlike' general intelligence. Greg Kamradt, president of ARC Prize Foundation, a nonprofit organization that maintains the two leaderboards, says that when the xAI team contacted the foundation with Grok 4's results, the organization then independently tested Grok 4 on a dataset to which the xAI team did not have access and confirmed the results. 'Before we report performance for any lab, it's not verified unless we verify it,' Kamradt says. 'We approved the [testing results] slide that [the xAI team] showed in the launch.' According to xAI, Grok 4 also outstrips other AI systems on a number of additional benchmarks that suggest its strength in STEM subjects (read a full breakdown of the benchmarks here). Alex Olteanu, a senior data science editor at AI education platform DataCamp, has tested it. 'Grok has been strong on math and programming in my tests, and I've been impressed by the quality of its chain-of-thought reasoning, which shows an ingenious and logically sound approach to problem-solving,' Olteanu says. 'Its context window, however, isn't very competitive, and it may struggle with large code bases like those you encounter in production. It also fell short when I asked it to analyze a 170-page PDF, likely due to its limited context window and weak multimodal abilities.' (Multimodal abilities refer to a model's capacity to analyze more than one kind of data at the same time, such as a combination of text, images, audio and video.) On a more nuanced front, issues with Grok 4 have surfaced since its release. Several posters on X —owned by Musk himself—as well as tech-industry news outlets have reported that when Grok 4 was asked questions about the Israeli-Palestinian conflict, abortion and U.S. immigration law, it often searched for Musk's stance on these issues by referencing his X posts and articles written about him. And the release of Grok 4 comes after several controversies with Grok 3, the previous model, which issued outputs that included antisemitic comments, praise for Hitler and claims of 'white genocide'—incidents that xAI publicly acknowledged, attributing them to unauthorized manipulations and stating that the company was implementing corrective measures. At one point during the launch, Musk commented on how making an AI smarter than humans is frightening, though he said he believes the ultimate result will be good—probably. 'I somewhat reconciled myself to the fact that, even if it wasn't going to be good, I'd at least like to be alive to see it happen,' he said.

Elon Musk Unveils Grok 4: xAI's Smartest AI Yet to Rival ChatGPT and Gemini

International Business Times

3 days ago

Business
International Business Times

Elon Musk Unveils Grok 4: xAI's Smartest AI Yet to Rival ChatGPT and Gemini

Tech mogul Elon Musk has officially introduced Grok 4, the latest artificial intelligence model from his company xAI. The announcement came during a livestream on X (formerly Twitter), drawing over 1.5 million viewers. Musk claimed that Grok 4 is currently the "smartest AI in the world," boasting exceptional reasoning skills and expert-level knowledge across various subjects. Grok 4 was tested on a challenging benchmark called Humanity's Last Exam, which includes more than 2,500 questions across science, math, and language. The AI scored 25.4% without tools and an improved 44.4% when using advanced tool support. It also outperformed some rivals in visual reasoning, securing 16.2% on the ARC-AGI-2 test—beating Claude Opus 4. For users looking to try Grok 4, the standard monthly subscription is priced at $30. A premium version called SuperGrok Heavy, offering early access to experimental features and multi-agent support, is available for $300 per month. The latest version also includes five new voice options for interactive responses. Grok 4's core capabilities include writing, debugging, and explaining code. It also introduces DeepSearch—a feature that lets users get real-time information during chats without switching tabs. The platform supports not just text, but also image and video generation, aiming to rival OpenAI's GPT-4o and Google's Gemini 2.5 Pro. The update follows a recent controversy, where xAI had to pull down Grok 3-generated posts containing antisemitic content. The issue triggered public criticism and a brief suspension, prompting the company to improve safeguards in the new version. Musk says Grok 4 was built on xAI's powerful Colossus supercomputer to deliver faster, smarter responses across a range of applications—from daily tasks to software development.

Elon Musk Unveils Grok 4 and SuperGrok Heavy: xAI Challenges AI Giants with Frontier-Level Models

Hans India

4 days ago

Business
Hans India

Elon Musk Unveils Grok 4 and SuperGrok Heavy: xAI Challenges AI Giants with Frontier-Level Models

In a major leap for generative AI, Elon Musk has officially launched xAI's latest models—Grok 4, Grok 4 Heavy, and a premium-tier SuperGrok Heavy priced at $300 per month. The unveiling marks Musk's boldest step yet in challenging top-tier models from OpenAI, Google, and Anthropic. Announced during a livestream on Thursday, Musk highlighted Grok 4's superior academic capabilities. 'At times, it may lack common sense, and it has not yet invented new technologies or discovered new physics, but that is just a matter of time,' he noted. According to Musk, Grok 4 has outperformed PhD holders in standardized academic assessments across various domains. A Closer Look at Grok 4 and Its Variants The newly introduced Grok 4 is capable of not only generating human-like responses but also interpreting and analyzing images. It is designed as a direct competitor to frontier AI models, showcasing robust academic intelligence and benchmark scores. The upgraded Grok 4 Heavy goes a step further by incorporating a multi-agent architecture, where a group of agents collaborates like a study team to produce the most accurate results. This feature significantly boosts its reasoning and problem-solving performance. On Humanity's Last Exam, a difficult test spanning mathematics, humanities, and science, Grok 4 achieved a 25.4% score without tool assistance—surpassing Google's Gemini 2.5 Pro (21.6%) and OpenAI's o3 (21%). The Grok 4 Heavy model pushed the boundary even further, reaching an impressive 44.4%, outperforming Gemini 2.5 Pro's 26.9%. SuperGrok Heavy: Premium Power At $300 per month, the SuperGrok Heavy subscription model is aimed at professionals and early adopters seeking elite performance. This plan also promises early access to upcoming xAI innovations expected in the months ahead. Turbulent Times for Musk's Empire The launch comes amid some internal shake-ups across Musk's business ventures. Just a day earlier, Linda Yaccarino stepped down as CEO of X, the social media platform previously known as Twitter, after nearly two years at the helm. Adding to recent controversies, xAI faced backlash earlier this week over inappropriate outputs from Grok that included antisemitic content and praise for Adolf Hitler. The company issued a public response stating, 'We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts.' Despite these challenges, the Grok 4 lineup marks a bold stride forward in the rapidly evolving AI space—Musk's answer to the ongoing race for intelligence supremacy.

Latest news with #Humanity'sLastExam

Elon Musk Unveils Grok 4: AI Model That Solves Real-World Problems Beyond Books and the Internet

Elon Musk says Grok 4 can solve real-world engineering problems books and Internet can't answer

New Grok 4 Takes on ‘Humanity's Last Exam' as the AI Race Heats Up

Elon Musk Unveils Grok 4: xAI's Smartest AI Yet to Rival ChatGPT and Gemini

Elon Musk Unveils Grok 4 and SuperGrok Heavy: xAI Challenges AI Giants with Frontier-Level Models

Get Started Now: Download the App