Latest news with #GPQA


Techday NZ
17-07-2025
- Business
- Techday NZ
Record investment & policy headline Stanford's AI report
The pace of progress in artificial intelligence has accelerated to historic highs, with breakthroughs in technical capabilities, adoption across sectors, and global governance, according to the latest Artificial Intelligence Index Report 2025 from Stanford University's Institute for Human-Centreed Artificial Intelligence (HAI). The eighth edition of the report describes 2024 as a pivotal year, marked by "unprecedented" leaps in AI performance, new records in private investment, and intensifying government involvement. "The 2025 Index is our most comprehensive to date and arrives at an important moment, as AI's influence across society, the economy, and global governance continues to intensify," write co-directors Yolanda Gil and Raymond Perrault in their introduction. "AI is no longer just a story of what's possible - it's a story of what's happening now and how we are collectively shaping the future of humanity." Record growth in performance and usage AI models continue to outperform previous benchmarks at a rapid rate. In the past year alone, performance rose by 18.8 percentage points on the MMMU benchmark, 48.9 points on GPQA, and 67.3 points on the SWE-bench, which tests advanced coding tasks. The report finds that "AI systems made major strides in generating high-quality video, and in some settings, language model agents even outperformed humans in programming tasks with limited time budgets." AI is also increasingly present in everyday life, particularly in healthcare and transportation. In 2023, the US Food and Drug Administration approved 223 AI-enabled medical devices, up from just six in 2015. Meanwhile, autonomous vehicle usage has scaled up: "Waymo, one of the largest US operators, provides over 150,000 autonomous rides each week, while Baidu's affordable Apollo Go robotaxi fleet now serves numerous cities across China." Investment and industry adoption surge Private investment in AI hit new highs in 2024. According to the report, "US private AI investment grew to $109.1 billion - nearly 12 times China's $9.3 billion and 24 times the UK's $4.5 billion. Generative AI saw particularly strong momentum, attracting $33.9 billion globally in private investment - an 18.7% increase from 2023." Business adoption of AI is also accelerating: "78% of organisations reported using AI in 2024, up from 55% the year before." The report cites research showing that "AI boosts productivity and, in most cases, helps narrow skill gaps across the workforce." The sector has experienced "dramatic expansion over the past decade, with total investment growing more than thirteenfold since 2014." Global leadership and competition While the US remains the leader in producing top AI models, China is rapidly closing the performance gap. In 2024, US-based institutions produced 40 notable AI models, compared to China's 15. However, "Chinese models have rapidly closed the quality gap: performance differences on major benchmarks such as MMLU and HumanEval shrank from double digits in 2023 to near parity in 2024," the report finds. China also leads in the number of AI research publications and patents, accounting for 69.7% of all AI patent grants in 2023. "Between 2010 and 2023, the number of AI patents has grown steadily and significantly, ballooning from 3,833 to 122,511. In just the last year, the number of AI patents has risen 29.6%," the authors note. Policy, regulation and public attitudes Governments are stepping up both investment and regulation. In 2024, US federal agencies introduced 59 AI-related regulations, more than double the number in 2023. Canada, China, France, India, and Saudi Arabia all announced major national AI investment packages, ranging from $1.25 billion to $100 billion. "Legislative mentions of AI rose 21.3% across 75 countries since 2023, marking a ninefold increase since 2016," the report states. Despite the optimism, trust and bias remain challenges. The report finds "fewer people believe AI companies will safeguard their data, and concerns about fairness and bias persist. Misinformation continues to pose risks, particularly in elections and the proliferation of deepfakes." In response, governments and international organisations are "advancing new regulatory frameworks aimed at promoting transparency, accountability, and fairness." A global survey in 2024 found notable regional divides in public optimism about AI. In China, Indonesia, and Thailand, more than 75% of respondents viewed AI as more beneficial than harmful, compared to just 40% in Canada and 39% in the United States. Still, optimism is rising: "Since 2022, optimism has grown significantly in several previously sceptical countries, including Germany (+10%), France (+10%), Canada (+8%), Great Britain (+8%), and the United States (+4%)." The path forward Looking ahead, the AI Index calls for continued vigilance, collaboration and data-driven policymaking. "In a world where AI is discussed everywhere - from boardrooms to kitchen tables - this mission has never been more essential," write the co-directors. "Longitudinal tracking remains at the heart of our mission. In a domain advancing at breakneck speed, the Index provides essential context - helping us understand where AI stands today, how it got here, and where it may be headed next."


Techday NZ
14-05-2025
- Business
- Techday NZ
OpenAI forum explores AI's economic impact and direction
At the recent forum hosted by OpenAI, Chief Product Officer Kevin Weil and Stanford professor Erik Brynjolfsson explored the challenges, opportunities, and economic implications of artificial intelligence, offering candid reflections on AI's role in productivity, policy, and how it complements or competes with human labour. Brynjolfsson, a leading voice on the economics of technological change, acknowledged the ongoing debate about whether AI is delivering tangible gains. "Right now, if you look at the official productivity statistics last quarter, it was 1.2 percent, which is not that impressive," he said. "In the 90s, it was more than twice as high. In the early 2000s, it was more than twice as high." He argued that the current underwhelming figures are partly a result of how value is measured. "GDP measures a lot of things, but it doesn't do a good job of measuring things that have zero price," he said, citing digital goods like ChatGPT and Wikipedia, which generate value without costing users money. The other key issue, Brynjolfsson suggested, is structural. "These general purpose technologies... require re-skilling, changing your business processes, figuring out better ways of using the technology," he explained. This delay in payoff is what he and others call the "productivity J-curve". However, he was cautiously optimistic: "I think it's happening a lot quicker this time." Weil compared previous technological transitions—such as electricity and the internet—to the adoption of AI, noting that AI tools like ChatGPT require far less specialised knowledge. "You don't need to learn a new arcane coding language," he said. "It does... maybe you have to learn a little bit of prompting." The conversation turned to the potential for AI to disrupt existing business structures by empowering new entrants. "Can they make the cycle go faster because they're actually able to punch above their weight class?" Weil asked. Brynjolfsson concurred but noted that America's rate of business dynamism is decreasing. "There are actually fewer startups... nationwide. And there's less movement between companies, there's less geographic mobility." To measure AI's value beyond traditional economic indicators, Brynjolfsson described a new approach: "We've introduced a tool called GDP-B. The B stands for measuring the benefits rather than the costs." Using online choice experiments, his team estimates the consumer surplus of digital goods by asking participants how much compensation they would require to forgo a digital service for a time. "It's meant to be a representative market basket of what's in the economy," he said. Both speakers also questioned how society currently benchmarks intelligence in AI. Weil noted that evaluations like GPQA aim to assess AI models by comparing them to talented graduate students. "But that's not necessarily the right way to think about some of these models," he said. Brynjolfsson took the critique further: "With all due respect to my fellow humans, we are not the most general kind of intelligence." He advocated for benchmarks that measure intelligence beyond human-like capabilities. "There are all sorts of other kinds of intelligence... And it's not just an intellectual debate. It has to do with the direction of technology." The discussion also touched on the risks of over-centralising AI. Brynjolfsson warned of a future where a single AI system might dominate information and decision-making: "Maybe that will be more efficient if you have enough processing power. But... the humans wouldn't have a lot of bargaining power." Weil countered by highlighting the fragmented nature of data access. "No public model... will have access to all of the data that's relevant to solve the totality of problems... The vast majority of the world's data is private." This, he argued, makes it likely that multiple models will always coexist. In discussing trust in AI, Brynjolfsson offered a candid anecdote: "There was an article... where they had three treatments: the human-only, the AI-only, and the doctor plus the AI. And... the doctor plus the AI did worse than the AI alone." He attributed this to current systems being insufficiently interpretable. "They have to be able to trust and know... if the AI system just says, 'cut off the patient's left leg,' and the doctor's like, 'why?'... it's got to explain all the reasoning." As the event closed, both speakers emphasised the importance of supporting innovation through infrastructure like OpenAI's API. "Every time we drop the price and offer more intelligence, people can solve more problems," said Weil. Brynjolfsson emphasised the same idea: "Some people derisively call these things wrappers... Actually, I think that's where a ton of the value is going to be coming... customised for a particular vertical." In sum, the discussion underscored that while AI holds the potential to dramatically shift productivity and economic structures, its full impact will depend on how it is adopted, measured, and integrated with human capabilities.