People are using Super Mario to benchmark AI now
Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher.
Hao AI Lab, a research org at the University of California San Diego, on Friday threw AI into live Super Mario Bros. games. Anthropic's Claude 3.7 performed the best, followed by Claude 3.5. Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled.
It wasn't quite the same version of Super Mario Bros. as the original 1985 release, to be clear. The game ran in an emulator and integrated with a framework, GamingAgent, to give the AIs control over Mario.
GamingAgent, which Hao developed in-house, fed the AI basic instructions, like, "If an obstacle or enemy is near, move/jump left to dodge" and in-game screenshots. The AI then generated inputs in the form of Python code to control Mario.
Still, Hao says that the game forced each model to "learn" to plan complex maneuvers and develop gameplay strategies. Interestingly, the lab found that reasoning models like OpenAI's o1, which "think" through problems step by step to arrive at solutions, performed worse than "non-reasoning" models, despite being generally stronger on most benchmarks.
One of the main reasons reasoning models have trouble playing real-time games like this is that they take a while — seconds, usually — to decide on actions, according to the researchers. In Super Mario Bros., timing is everything. A second can mean the difference between a jump safely cleared and a plummet to your death.
Games have been used to benchmark AI for decades. But some experts have questioned the wisdom of drawing connections between AI's gaming skills and technological advancement. Unlike the real world, games tend to be abstract and relatively simple, and they provide a theoretically infinite amount of data to train AI.
The recent flashy gaming benchmarks point to what Andrej Karpathy, a research scientist and founding member at OpenAI, called an "evaluation crisis."
"I don't really know what [AI] metrics to look at right now," he wrote in a post on X. "TLDR my reaction is I don't really know how good these models are right now."
At least we can watch AI play Mario.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


New York Times
34 minutes ago
- New York Times
Welcome to Campus. Here's Your ChatGPT.
OpenAI, the maker of ChatGPT, has a plan to overhaul college education — by embedding its artificial intelligence tools in every facet of campus life. If the company's strategy succeeds, universities would give students A.I. assistants to help guide and tutor them from orientation day through graduation. Professors would provide customized A.I. study bots for each class. Career services would offer recruiter chatbots for students to practice job interviews. And undergrads could turn on a chatbot's voice mode to be quizzed aloud ahead of a test. OpenAI dubs its sales pitch 'A.I.-native universities.' 'Our vision is that, over time, A.I. would become part of the core infrastructure of higher education,' Leah Belsky, OpenAI's vice president of education, said in an interview. In the same way that colleges give students school email accounts, she said, soon 'every student who comes to campus would have access to their personalized A.I. account.' To spread chatbots on campuses, OpenAI is selling premium A.I. services to universities for faculty and student use. It is also running marketing campaigns aimed at getting students who have never used chatbots to try ChatGPT. Some universities, including the University of Maryland and California State University, are already working to make A.I. tools part of students' everyday experiences. In early June, Duke University began offering unlimited ChatGPT access to students, faculty and staff. The school also introduced a university platform, called DukeGPT, with A.I. tools developed by Duke. OpenAI's campaign is part of an escalating A.I. arms race among tech giants to win over universities and students with their chatbots. The company is following in the footsteps of rivals like Google and Microsoft that have for years pushed to get their computers and software into schools, and court students as future customers. Want all of The Times? Subscribe.
Yahoo
an hour ago
- Yahoo
You can now pre-order Pokémon Legends: Z-A for the Switch and Switch 2
Pokémon Legends: Z-A, the much-awaited follow-up to 2022's Pokémon Legends: Arceus, is now available for pre-order. You can get the digital version for the old Switch for $60 or for the new Switch 2 for $70 from Nintendo's website. In case you have no plans to get the Switch 2 anytime soon and you do get the game for the first Switch console, you can purchase an upgrade pack for $10 to get access to better graphics and higher frame rate for Nintendo's new system later on. Pre-ordering lets you pre-load the game so you can play it as soon as it becomes available on October 16. The action role-playing game is not just a follow-up to Arceus, but also a sequel to Pokémon X and Y. It's set in Lumiose City, introduced in X and Y over a decade ago, and features three available starters: Chikorita and Totodile from Gold and Silver, as well as Tepig from Black and White. Its gameplay is a mix of the real-time capture mechanics on Arceus and the old-school turn-based gameplay of traditional Pokémon titles. The game is also bringing back X and Y's Mega Evolution mechanic, which can temporarily transform a Pokémon into a much stronger monster, boosting its power and sometimes even changing its type. The physical version of Pokémon Legends: Z-A is now also available for pre-order from retailers like Walmart (Switch, Switch 2) and Best Buy (Switch, Switch 2.). It's not clear if pre-ordering the game will get you any bonuses, but the Walmart listing URLs hint at an "exclusive gift with purchase."

Business Insider
2 hours ago
- Business Insider
Who will be Trump's new Silicon Valley bestie?
Mark Zuckerberg, Meta Platforms founder and CEO Zuckerberg was something of a MAGA stan earlier this year. Meta, his company, dropped $1 million on Trump's inauguration, and Zuck even co-hosted a black-tie soirée that night to honor the second-time president. Now, with Meta in the throes of a federal antitrust lawsuit, Zuckerberg may not be on Trump's good side. But the Meta CEO could be playing the long game here: He snapped up a $23 million, 15,000 square-foot DC mega mansion, establishing more of a presence in the capital. Zuck has also been on a bit of a rebrand journey, from a hoodie-wearing founder to a gold chain-wearing CEO with unapologetic swagger. Part of this transformation has included podcast appearances, like an episode with Trump-endorsing Joe Rogan in which Zuck talked about his "masculine energy" and his proclivity for bowhunting. Sam Altman, OpenAI cofounder and CEO Altman has also been circling the throne. First came Stargate: the $100 billion AI infrastructure plan between OpenAI, Oracle, and SoftBank, announced the day after Trump's inauguration. Then, in May, the OpenAI CEO joined Trump on a trip to Saudi Arabia while Altman was working on a massive deal to build one of the world's largest AI data centers in Abu Dhabi. This reportedly rattled Musk enough to tag along at the last minute, according to the Wall Street Journal. OpenAI was ultimately selected for the deal, which Musk allegedly attempted to derail, the Wall Street Journal reported. Jeff Bezos, Amazon founder and executive chairman, Washington Post owner, and Blue Origin founder Back in 2015, Bezos wanted to launch Trump into orbit after the at-the-time presidential candidate fired shots at Bezos on what was Twitter, now X, calling the Washington Post, which Bezos owns, a "tax shelter," Bezos responded that he'd use Blue Origin, a space company Bezos founded, to "#sendDonaldtospace." Times have certainly changed. In January, Bezos said he is "very optimistic" about the administration's space agenda. Behind the scenes, he has reportedly given Trump political advice, allegedly as early as the summer of 2024, according to Axios. There was a brief flare-up in April, though, after Amazon reportedly considered listing Trump's tariffs next to products' prices on the site, according to Punchbowl News. White House press secretary Karoline Leavitt called the plan a "hostile and political action." The idea, which was never implemented, was scrapped, and an Amazon spokesperson insisted it was only ever meant for its low-cost Haul store. If Trump does cancel Musk's SpaceX government contracts as he threatened to do, Bezos' Blue Origin, and rival to SpaceX, could stand to benefit. Blue Origin already has a $3 billion contract with NASA. Jensen Huang, Nvidia cofounder and CEO While Huang was notably missing from Trump's second inauguration in January, he did attend the Middle East trip in May. Nvidia is partnering with Oracle, SoftBank, and G42 on the OpenAI data center plans in the UAE. But Nvidia hasn't gotten off too easy: In April, Trump banned the chip maker from selling its most advanced chips, the H20, to China, a move that Nvidia says cost it $5.5 billion and reportedly prompted the company to modify the chip for China to circumvent US export controls. Sundar Pichai, Google CEO In April, a federal judge ruled that Google holds an illegal monopoly in some advertising technology markets. This is one of two major legal blows to Google in the past year: Back in August 2024, a federal judge ruled that Google violated antitrust law with its online search. If Google has to sell Chrome, Barclays told clients on Monday, Alphabet stock could fall 25%. This flurry of litigation — and potential divestment of the Chrome business — puts Pichai between a rock and a hard place. While the CEO was spotted with the rest of the technorati at Trump's inauguration, it's hard to say how he might cozy up to Trump, and whether friendly relations would do anything to remedy these rulings.