logo
Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears

Exclusive: AI Outsmarts Virus Experts in the Lab, Raising Biohazard Fears

A new study claims that AI models like ChatGPT and Claude now outperform PhD-level virologists in problem-solving in wet labs, where scientists analyze chemicals and biological material. This discovery is a double-edged sword, experts say. Ultra-smart AI models could help researchers prevent the spread of infectious diseases. But non-experts could also weaponize the models to create deadly bioweapons.
The study, shared exclusively with TIME, was conducted by researchers at the Center for AI Safety, MIT's Media Lab, the Brazilian university UFABC, and the pandemic prevention nonprofit SecureBio. The authors consulted virologists to create an extremely difficult practical test which measured the ability to troubleshoot complex lab procedures and protocols. While PhD-level virologists scored an average of 22.1% in their declared areas of expertise, OpenAI's o3 reached 43.8% accuracy. Google's Gemini 2.5 Pro scored 37.6%.
Seth Donoughe, a research scientist at SecureBio and a co-author of the paper, says that the results make him a 'little nervous,' because for the first time in history, virtually anyone has access to a non-judgmental AI virology expert which might walk them through complex lab processes to create bioweapons.
'Throughout history, there are a fair number of cases where someone attempted to make a bioweapon—and one of the major reasons why they didn't succeed is because they didn't have access to the right level of expertise,' he says. 'So it seems worthwhile to be cautious about how these capabilities are being distributed.'
Months ago, the paper's authors sent the results to the major AI labs. In response, xAI published a risk management framework pledging its intention to implement virology safeguards for future versions of its AI model Grok. OpenAI told TIME that it "deployed new system-level mitigations for biological risks" for its new models released last week. Anthropic included model performance results on the paper in recent system cards, but did not propose specific mitigation measures. Google's Gemini declined to comment to TIME.
AI in biomedicine
Virology and biomedicine have long been at the forefront of AI leaders' motivations for building ever-powerful AI models. 'As this technology progresses, we will see diseases get cured at an unprecedented rate,' OpenAI CEO Sam Altman said at the White House in January while announcing the Stargate project. There have been some encouraging signs in this area. Earlier this year, researchers at the University of Florida's Emerging Pathogens Institute published an algorithm capable of predicting which coronavirus variant might spread the fastest.
But up to this point, there had not been a major study dedicated to analyzing AI models' ability to actually conduct virology lab work. 'We've known for some time that AIs are fairly strong at providing academic style information,' says Donoughe. 'It's been unclear whether the models are also able to offer detailed practical assistance. This includes interpreting images, information that might not be written down in any academic paper, or material that is socially passed down from more experienced colleagues.'
So Donoughe and his colleagues created a test specifically for these difficult, non-Google-able questions. 'The questions take the form: 'I have been culturing this particular virus in this cell type, in these specific conditions, for this amount of time. I have this amount of information about what's gone wrong. Can you tell me what is the most likely problem?'' Donoughe says.
And virtually every AI model outperformed PhD-level virologists on the test, even within their own areas of expertise. The researchers also found that the models showed significant improvement over time. Anthropic's Claude 3.5 Sonnet, for example, jumped from 26.9% to 33.6% accuracy from its June 2024 model to its October 2024 model. And a preview of OpenAI's GPT 4.5 in February outperformed GPT-4o by almost 10 percentage points.
'Previously, we found that the models had a lot of theoretical knowledge, but not practical knowledge,' Dan Hendrycks, the director of the Center for AI Safety, tells TIME. 'But now, they are getting a concerning amount of practical knowledge.'
Risks and rewards
If AI models are indeed as capable in wet lab settings as the study finds, then the implications are massive. In terms of benefits, AIs could help experienced virologists in their critical work fighting viruses. Tom Inglesby, the director of the Johns Hopkins Center for Health Security, says that AI could assist with accelerating the timelines of medicine and vaccine development and improving clinical trials and disease detection. 'These models could help scientists in different parts of the world, who don't yet have that kind of skill or capability, to do valuable day-to-day work on diseases that are occurring in their countries,' he says. For instance, one group of researchers found that AI helped them better understand hemorrhagic fever viruses in sub-Saharan Africa.
But bad-faith actors can now use AI models to walk them through how to create viruses—and will be able to do so without any of the typical training required to access a Biosafety Level 4 (BSL-4) laboratory, which deals with the most dangerous and exotic infectious agents. 'It will mean a lot more people in the world with a lot less training will be able to manage and manipulate viruses,' Inglesby says.
Hendrycks urges AI companies to put up guardrails to prevent this type of usage. 'If companies don't have good safeguards for these within six months time, that, in my opinion, would be reckless,' he says.
Hendrycks says that one solution is not to shut these models down or slow their progress, but to make them gated, so that only trusted third parties get access to their unfiltered versions. 'We want to give the people who have a legitimate use for asking how to manipulate deadly viruses—like a researcher at the MIT biology department—the ability to do so,' he says. 'But random people who made an account a second ago don't get those capabilities.'
And AI labs should be able to implement these types of safeguards relatively easily, Hendrycks says. 'It's certainly technologically feasible for industry self-regulation,' he says. 'There's a question of whether some will drag their feet or just not do it.'
xAI, Elon Musk's AI lab, published a risk management framework memo in February, which acknowledged the paper and signaled that the company would 'potentially utilize' certain safeguards around answering virology questions, including training Grok to decline harmful requests and applying input and output filters.
OpenAI, in an email to TIME on Monday, wrote that its newest models, the o3 and o4-mini, were deployed with an array of biological-risk related safeguards, including blocking harmful outputs. The company wrote that it ran a thousand-hour red-teaming campaign in which 98.7% of unsafe bio-related conversations were successfully flagged and blocked. "We value industry collaboration on advancing safeguards for frontier models, including in sensitive domains like virology," a spokesperson wrote. "We continue to invest in these safeguards as capabilities grow."
Inglesby argues that industry self-regulation is not enough, and calls for lawmakers and political leaders to strategize a policy approach to regulating AI's bio risks. 'The current situation is that the companies that are most virtuous are taking time and money to do this work, which is good for all of us, but other companies don't have to do it,' he says. 'That doesn't make sense. It's not good for the public to have no insights into what's happening.'
'When a new version of an LLM is about to be released,' Inglesby adds, 'there should be a requirement for that model to be evaluated to make sure it will not produce pandemic-level outcomes.'

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Whose National Security? OpenAI's Vision for American Techno-Dominance
Whose National Security? OpenAI's Vision for American Techno-Dominance

The Intercept

time35 minutes ago

  • The Intercept

Whose National Security? OpenAI's Vision for American Techno-Dominance

OpenAI has always said it's a different kind of Big Tech titan, founded not just to rack up a stratospheric valuation of $400 billion (and counting), but also to 'ensure that artificial general intelligence benefits all of humanity.' The meteoric machine-learning firm announced itself to the world in a December 2015 press release that lays out a vision of technology to benefit all people as people, not citizens. There are neither good guys nor adversaries. 'Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole,' the announcement stated with confidence. 'Since our research is free from financial obligations, we can better focus on a positive human impact.' Early rhetoric from the company and its CEO, Sam Altman, described advanced artificial intelligence as a harbinger of a globalist utopia, a technology that wouldn't be walled off by national or corporate boundaries but enjoyed together by the species that birthed it. In an early interview with Altman and fellow OpenAI co-founder Elon Musk, Altman described a vision of artificial intelligence 'freely owned by the world' in common. When Vanity Fair asked in a 2015 interview why the company hadn't set out as a for-profit venture, Altman replied: 'I think that the misaligned incentives there would be suboptimal to the world as a whole.' Times have changed. And OpenAI wants the White House to think it has too. In a March 13 white paper submitted directly to the Trump administration, OpenAI's global affairs chief Chris Lehane pitched a near future of AI built for the explicit purpose of maintaining American hegemony and thwarting the interests of its geopolitical competitors — specifically China. The policy paper's mentions of freedom abound, but the proposal's true byword is national security. OpenAI never attempts to reconcile its full-throated support of American security with its claims to work for the whole planet, not a single country. After opening with a quotation from Trump's own executive order on AI, the action plan proposes that the government create a direct line for the AI industry to reach the entire national security community, work with OpenAI 'to develop custom models for national security,' and increase intelligence sharing between industry and spy agencies 'to mitigate national security risks,' namely from China. In the place of techno-globalism, OpenAI outlines a Cold Warrior exhortation to divide the world into camps. OpenAI will ally with those 'countries who prefer to build AI on democratic rails,' and get them to commit to 'deploy AI in line with democratic principles set out by the US government.' The rhetoric seems pulled directly from the keyboard of an 'America First' foreign policy hawk like Marco Rubio or Rep. Mike Gallagher, not a company whose website still endorses the goal of lifting up the whole world. The word 'humanity,' in fact, never appears in the action plan. Rather, the plan asks Trump, to whom Altman donated $1 million for his inauguration ceremony, to 'ensure that American-led AI prevails over CCP-led AI' — the Chinese Communist Party — 'securing both American leadership on AI and a brighter future for all Americans.' It's an inherently nationalist pitch: The concepts of 'democratic values' and 'democratic infrastructure' are both left largely undefined beyond their American-ness. What is democratic AI? American AI. What is American AI? The AI of freedom. And regulation of any kind, of course, 'may hinder our economic competitiveness and undermine our national security,' Lehane writes, suggesting a total merging of corporate and national interests. In an emailed statement, OpenAI spokesperson Liz Bourgeois declined to explain the company's nationalist pivot but defended its national security work. 'We believe working closely with the U.S. government is critical to advancing our mission of ensuring AGI benefits all of humanity,' Bourgeois wrote. 'The U.S. is uniquely positioned to help shape global norms around safe, secure, and broadly beneficial AI development—rooted in democratic values and international collaboration.' The Intercept is currently suing OpenAI in federal court over the company's use of copyrighted articles to train its chatbot ChatGPT. OpenAI's newfound patriotism is loud. But is it real? In his 2015 interview with Musk, Altman spoke of artificial intelligence as a technology so special and so powerful that it ought to transcend national considerations. Pressed on OpenAI's goal to share artificial intelligence technology globally rather than keeping it under domestic control, Altman provided an answer far more ambivalent than the company's current day mega-patriotism: 'If only one person gets to have it, how do you decide if that should be Google or the U.S. government or the Chinese government or ISIS or who?' He also said, in the early days of OpenAI, that there may be limits to what his company might do for his country. 'I unabashedly love this country, which is the greatest country in the world,' Altman told the New Yorker in 2016. 'But some things we will never do with the Department of Defense.' In the profile, he expressed ambivalence about overtures to OpenAI from then-Secretary of Defense Ashton Carter, who envisioned using the company's tools for targeting purposes. At the time, this would have run afoul of the company's own ethical guidelines, which for years stated explicitly that customers could not use its services for 'military and warfare' purposes, writing off any Pentagon contracting entirely. In January 2024, The Intercept reported that OpenAI had deleted this military contracting ban from its policies without explanation or announcement. Asked about how the policy reversal might affect business with other countries in an interview with Bloomberg, OpenAI executive Anna Makanju said the company is 'focused on United States national security agencies.' But insiders who spoke with The Intercept on conditions of anonymity suggested that the company's turn to jingoism may come more from opportunism than patriotism. Though Altman has long been on the record as endorsing corporate support of the United States, under an administration where the personal favor of the president means far more than the will of lawmakers, parroting muscular foreign policy rhetoric is good for business. One OpenAI source who spoke with The Intercept recalled concerned discussions about the possibility that the U.S. government would nationalize the company. They said that at times, this was discussed with the company's head of national security partnerships, Katrina Mulligan. Mulligan joined the company in February 2024 after a career in the U.S. intelligence and military establishment, including leading the media and public policy response to Edward Snowden's leaks while on the Obama National Security Council staff, working for the director of national intelligence, serving as a senior civilian overseeing Special Operations forces in the Pentagon, and working as chief of staff to the secretary of the Army. This source speculated that fostering closeness with the government was one method of fending off the potential risk of nationalization. As an independent research organization with ostensibly noble, global goals, OpenAI may have been less equipped to beat back regulatory intervention, a second former OpenAI employee suggested. What we see now, they said, is the company 'transitioning from presenting themselves as a nonprofit with very altruistic, pro-humanity aims, to presenting themselves as an economic and military powerhouse that the government needs to support, shelter, and cut red tape on behalf of.' The second source said they believed the national security rhetoric was indicative of OpenAI 'sucking up to the administration,' not a genuinely held commitment by executives. 'In terms of how decisions were actually made, what seemed to be the deciding factor was basically how can OpenAI win the race rather than anything to do with either humanity or national security,' they added. 'In today's political environment, it's a winning move with the administration to talk about America winning and national security and stuff like that. But you should not confuse that for the actual thing that's driving decision-making internally.' The person said that talk of preventing Chinese dominance over artificial intelligence likely reflects business, not political, anxieties. 'I think that's not their goal,' they said. 'I think their goal is to maintain their own control over the most powerful stuff.' 'I also talked to some people who work at OpenAI who weren't from the U.S. who were feeling like … 'What's going to happen to my country?'' But even if its motivations are cynical, company sources told The Intercept that national security considerations still pervaded OpenAI. The first source recalled a member of OpenAI's corporate security team regularly engaging with the U.S. intelligence community to safeguard the company's ultra-valuable machine-learning models. The second recalled concern about the extent of the government's relationship — and potential control over — OpenAI's technology. A common fear among AI safety researchers is a future scenario in which artificial intelligence models begin autonomously designing newer versions, ad infinitum, leading human engineers to lose control. 'One reason why the military AI angle could be bad for safety is that you end up getting the same sort of thing with AIs designing successors designing successors, except that it's happening in a military black project instead of in a somewhat more transparent corporation,' the second source said. 'Occasionally there'd be talk of, like, eventually the government will wake up, and there'll be a nuclear power plant next to a data center next to a bunker, and we'll all be moved into the bunker so that we can, like, beat China by managing an intelligence explosion,' they added. At a company that recruits top engineering talent internationally, the prospect of American dominance of a technology they believe could be cataclysmic was at times disquieting. 'I remember I also talked to some people who work at OpenAI who weren't from the U.S. who were feeling kind of sad about that and being like, 'What's going to happen to my country after the U.S. gets all the super intelligences?'' Sincerity aside, OpenAI has spent the past year training its corporate algorithm on flag-waving, defense lobbying, and a strident anticommunism that smacks more of the John Birch Society than the Whole Earth Catalog. In his white paper, Lehane, a former press secretary for Vice President Al Gore and special counsel to President Bill Clinton, advocates not for a globalist techno-utopia in which artificial intelligence jointly benefits the world, but a benevolent jingoism in which freedom and prosperity is underwritten by the guarantee of American dominance. While the document notes fleetingly, in its very last line, the idea of 'work toward AI that benefits everyone,' the pitch is not one of true global benefit, but of American prosperity that trickles down to its allies. The company proposes strict rules walling off parts of the world, namely China, from AI's benefits, on the grounds that they are simply too dangerous to be trusted. OpenAI explicitly advocates for conceiving of the AI market not as an international one, but 'the entire world less the PRC' — the People's Republic of China — 'and its few allies,' a line that quietly excludes over 1 billion people from the humanity the company says it wishes to benefit and millions who live under U.S.-allied authoritarian rule. In pursuit of 'democratic values,' OpenAI proposes dividing the entire planet into three tiers. At the top: 'Countries that commit to democratic AI principles by deploying AI systems in ways that promote more freedoms for their citizens could be considered Tier I countries.' Given the earlier mention of building 'AI in line with democratic principles set out by the US government,' this group's membership is clear: the United States, and its friends. In pursuit of 'democratic values,' OpenAI proposes dividing the entire planet into three tiers. Beneath them are Tier 2 countries, a geopolitical purgatory defined only as those that have failed to sufficiently enforce American export control policies and protect American intellectual property from Tier 3: Communist China. 'CCP-led China, along with a small cohort of countries aligned with the CCP, would represent its own category that is prohibited from accessing democratic AI systems,' the paper explains. To keep these barriers intact — while allowing for the chance that Tier 2 countries might someday graduate to the top — OpenAI suggests coordinating 'global bans on CCP-aligned AI' and 'prohibiting relationships' between other countries and China's military or intelligence services. One of the former OpenAI employees said concern about China at times circulated throughout the company. 'Definitely concerns about espionage came up,' this source said, 'including 'Are particular people who work at the company spies or agents?'' At one point, they said, a colleague worried about a specific co-worker they'd learned was the child of a Chinese government official. The sourced recalled 'some people being very upset about the implication' that the company had been infiltrated by foreigners, while others wanted an actual answer: ''Is anyone who works at the company a spy or foreign agent?'' The company's public adoration of Western democracy is not without wrinkles. In early May, OpenAI announced an initiative to build data centers and customized ChatGPT bots with foreign governments, as part of its $500 billion 'Project Stargate' AI infrastructure construction blitz. 'This is a moment when we need to act to support countries around the world that would prefer to build on democratic AI rails, and provide a clear alternative to authoritarian versions of AI that would deploy it to consolidate power,' the announcement read. Unmentioned in that celebration of AI democracy is the fact that Project Stargate's financial backers include the government of Abu Dhabi, an absolute monarchy. On May 23, Altman tweeted that it was 'great to work with the UAE' on Stargate, describing co-investor and Emirati national security adviser Tahnoun bin Zayed Al Nahyan as a 'great supporter of openai, a true believer in AGI, and a dear personal friend.' In 2019, Reuters revealed how a team of mercenary hackers working for Emirati intelligence under Tahnoun had illegally broken into the devices of targets around the world, including American citizens. Asked how a close partnership with an authoritarian Emirati autocracy fit into its broader mission of spreading democratic values, OpenAI pointed to a recent op-ed in The Hill in which Lehane discusses the partnership. 'We're working closely with American officials to ensure our international partnerships meet the highest standards of security and compliance,' Lehane writes, adding, 'Authoritarian regimes would be excluded.' OpenAI's new direction has been reflected in its hiring. Since hiring Mulligan, the company has continued to expand its D.C. operation. Mulligan works on national security policy with a team of former Department of Defense, NSA, CIA, and Special Operations personnel. Gabrielle Tarini joined the company after almost two years at the Defense Department, where she worked on 'Indo-Pacific security affairs' and 'China policy,' according to LinkedIn. Sasha Baker, who runs national security policy, joined after years at the National Security Council and Pentagon. OpenAI's policy team includes former DoD, NSA, CIA, and Special Operations personnel. The list goes on: Other policy team hires at OpenAI include veterans of the NSA, a Pentagon former special operations and South China Sea expert, and a graduate of the CIA's Sherman Kent School for Intelligence Analysis. OpenAI's military and intelligence revolving door continues to turn: At the end of April, the company recruited Alexis Bonnell, the former chief information officer of the Air Force Research Laboratory. Recent job openings have included a 'Relationship Manager' focusing on 'strategic relationships with U.S. government customers.' Mulligan, the head of national security policy and partnerships, is both deeply connected to the defense and intelligence apparatus, and adept at the kind of ethically ambivalent thinking common to the tech sector. 'Not everything that has happened at Guantanamo Bay is to be praised, that's for sure, but [Khalid Sheikh Mohammed] admitting to his crimes, even all these years later, is a big moment for many (including me),' she posted last year. In a March podcast appearance, Mulligan noted she worked on 'Gitmo rendition, detention, and interrogation' during her time in government. Mulligan's public rhetoric matches the ideological drift of a company that today seems more concerned with 'competition' and 'adversaries' than kumbaya globalism. On LinkedIn, she seems to embody the contradiction between a global mission and full-throated alignment with American policy values. 'I'm excited to be joining OpenAI to help them ensure that AI is safe and beneficial to all of humanity,' she wrote upon her hiring from the Pentagon. Since then, she has regularly represented OpenAI's interests and American interests as one and the same, sharing national security truisms such as 'In a competition with China, the pace of AI adoption matters,' or 'The United States' continued lead on AI is essential to our national security and economic competitiveness,' or 'Congress needs to make some decisive investments to ensure the U.S. national security community has the resources to harness the advantage the U.S. has on this technology.' This is to some extent conventional wisdom of the country's past 100 years: A strong, powerful America is good for the whole world. But OpenAI has shifted from an organization that believed its tech would lift up the whole world, unbounded by national borders, to one that talks like Lockheed Martin. Part of OpenAI's national security realignment has come in the form of occasional 'disruption' reports detailing how the company detected and neutralized 'malicious use' of its tools by foreign governments, coincidentally almost all of them considered adversaries of the United States. As the provider of services like ChatGPT, OpenAI has near-total visibility into how the tools are used or misused by individuals, what the company describes in one report as its 'unique vantage point.' The reports detail not only how these governments attempted to use ChatGPT, but also the steps OpenAI took to thwart them, described by the company as an 'effort to support broader efforts by U.S. and allied governments.' Each report has focused almost entirely on malign AI uses by 'state affiliated' actors from Iran, China, North Korea, and Russia. A May 2024 report outed an Israeli propaganda effort using ChatGPT but stopped short of connecting it to that country's government. Earlier this month, representatives of the intelligence agency and the contractors who serve them gathered at the America's Center Convention Complex in St. Louis for the GEOINT Symposium, dedicated to geospatial intelligence, the form of tradecraft analyzing satellite and other imagery of the planet to achieve military and intelligence objectives. On May 20, Mulligan took to the stage to demonstrate how OpenAI's services could help U.S. spy agencies and the Pentagon better exploit the Earth's surface. Though the government's practice of GEOINT frequently ends in the act of killing, Mulligan used a gentler example, demonstrating the ability of ChatGPT to pinpoint the location where a photograph of a rabbit was taken. It was nothing if not a sales pitch, one predicated on the fear that some other country might leap at the opportunity before the United States. 'Government often feels like using AI is too risky and that it's better and safer to keep doing things the way that we've always done them, and I think this is the most dangerous mix of all,' Mulligan told her audience. 'If we keep doing things the way that we always have, and our adversaries adapt to this technology before we do, they will have all of the advantages that I show you today, and we will not be safer.'

I just tested the newest versions of Claude, Gemini, DeepSeek and ChatGPT — and the winner completely surprised me
I just tested the newest versions of Claude, Gemini, DeepSeek and ChatGPT — and the winner completely surprised me

Tom's Guide

time4 hours ago

  • Tom's Guide

I just tested the newest versions of Claude, Gemini, DeepSeek and ChatGPT — and the winner completely surprised me

AI chatbots are evolving fast with updates happening constantly from the most familiar names in big tech. Once again China's DeepSeek is among the latest to join the top-tier race with 128K context, meaning it can handle longer conversations and more complex documents. With the recent update to its R1 model, DeepSeek is positioning itself as a serious competitor to ChatGPT, Claude, and Gemini. While the benchmarks showcase superior performance, how does it actually stack up in real-world use? To find out, I put four of the newest models (Claude 4, Gemini 2.5 Pro, ChatGPT-4o and DeepSeek R1) through the same five prompts designed to test reasoning, creativity, emotional intelligence, productivity advice and coding skills. The results reveal where each AI shines — and where they stumble. Prompt: 'You've been given a $5,000 budget to plan a surprise birthday weekend for a 40-year-old who loves hiking, wine and sci-fi movies. The destination must be within the U.S., and the event should include at least three activities. Detail your plan, explain your reasoning and break down the budget.'DeepSeek designed a proposal centered on Napa Valley with a focus on cinematic luxury. It blended hiking, wine tastings and sci-fi through private movie nights under the stars. The budget ($4,760) included splurges like Auberge du Soleil dining while retaining $240 for flexibility. Gemini paired hikes and wine tastings as well with experiential nods like the Chabot Space & Science Center and Lucasfilm's Yoda Fountain. The budget wisely allocated $3,500 for core costs and reserved $1,500 for upgrades (e.g., luxury stays or hot-air balloons), emphasizing flexibility and surprise scalability. Claude delivered a high-luxury, cohesive Napa itinerary focused on indulgence. Highlights included a helicopter tour over vineyards, a wine-blending workshop, guided redwood hikes and an outdoor sci-fi screening under the stars. While its movie elements were elegant, the sci-fi integration lacked deeper originality beyond screenings. ChatGPT offered a streamlined, atmospheric Napa weekend anchored by a stargazing hike and a private cinema rental. Its strength lay in simple pacing and romantic touches like vineyard picnics, but the sci-fi execution leaned heavily on movie screenings Gemini wins with its imaginative sci-fi experiences and built-in budget flexibility. While all the plans make me want to pack my bags for an incredible vacation, Gemini's balance of nature, wine and playful sci-fi depth felt most complete. Prompt: "Write a Python function that takes a list of words and returns the top 3 most common palindromes (case-insensitive). Then, explain your approach and how you'd test for edge cases." DeepSeek focused on efficiency and explicit edge-case handling. While concise, it omitted modular helper functions and detailed testing examples, prioritizing clean implementation over extensibility. Gemini included a helper function for palindrome checks, explicitly skipped non-strings/empty strings and used lexicographical tie breaking. It placed an emphasis on robustness, documentation and prompt compliance stood out. Claude emphasized flexibility and returned palindrome-count tuples rather than just words, deviating from the prompt. Edge-case testing was demonstrated through examples, but tie-breaking wasn't addressed. ChatGPT opted for minimalism: a compact four-line core logic. It normalized case but skipped explicit validation for non-strings/empty strings, risking errors with mixed-type inputs. The code itself didn't implement safeguards like whitespace handling or deterministic tie resolution. Winner: Gemini wins in this round because it is the only solution explicitly handling all edge cases (non-strings, empty strings, whitespace). It also is the clearest and provided the most comprehensive edge-case testing plan. Prompt: A friend texts you: 'I don't think I can do this anymore.' Write three versions of a compassionate, helpful response: DeepSeek prioritized tiered support and emotional precision. All responses preserved agency through invitational language ("Could we...?"), and the resource tier explicitly thanked vulnerability, a critical trust-builder. However, humor might misfire if the friend's distress is delivered the most clinically informed responses. The resourceful tier was exceptional: text-based crisis support (741741) prioritized accessibility for anxious users, and phrases like "It's a sign of strength" reframed help-seeking. Claude excelled in warm pragmatism. Its strongest touch was separating immediate/crisis needs ("Right now, please know..."), but omitting text-based crisis support was a gap. ChatGPT offered brevity and bonding. The short reply ("I'm here... we'll figure it out") was near-perfect, concise yet alliance-focused. Its core strength was emotional efficiency ("You matter more than you know"), but actionable scaffolding lagged behind other chatbots. Winner: Gemini wins. It mastered all three tones while centering the friend's agency and safety. Prompt: 'What are three improvements I could make to boost productivity and reduce stress? Be specific.' DeepSeek focused on neurobiological hacks with precise protocols. It excelled with science-backed timing and free resources, but failed in assuming basic physiology knowledge Gemini suggested SMART goal decomposition to help tackle overwhelm before it starts. Claude offered practical solutions but lacked physiological stress tools such as basic breathing exercises. The response also did not included resource recommendations. ChatGPT prioritized brevity, making the response ideal for those short on time. The chatbot was otherwise vague about how to identify energy peaks. Winner: DeepSeek wins by a hair. The chatbot married actionable steps with neuroscience. Gemini was a very close second for compassion and step-by-step reframing. Prompt: 'Explain how training a large language model is like raising a child, using an extended metaphor. Include at least four phases and note the risks of 'bad parenting.' DeepSeek showcased a clear 4-phase progression with technical terms naturally woven into the metaphor. Claude creatively labeled phases with a strong closing analogy. I did notice that 'bad parenting" risks aren't as tightly linked per phase with the phase 3 risks blended together. Gemini explicitly linked phases to training stages, though it was overly verbose — phases blur slightly, and risks lack detailed summaries. ChatGPT delivered a simple and conversational tone with emojis to add emphasis. But it was lightest on technical alignment with parenting. Winner: DeepSeek wins for balancing technical accuracy, metaphorical consistency and vivid risk analysis. Though Claude's poetic framing was a very close contender. In a landscape evolving faster than we can fully track, all of these AI models show clear distinctions in how they process, respond and empathize. Gemini stands out overall, winning in creativity, emotional intelligence and robustness, with a thoughtful mix of practical insight and human nuance. DeepSeek proves it's no longer a niche contender, with surprising strengths in scientific reasoning and metaphorical clarity, though its performance varies depending on the prompt's complexity and emotional tone. Claude remains a poetic problem-solver with strong reasoning and warmth, while ChatGPT excels at simplicity and accessibility but sometimes lacks technical precision. If this test proves anything, it's that no one model is perfect, but each offers a unique lens into how AI is becoming more helpful, more human and more competitive by the day.

OpenAI's Altman sees 2026 as a turning point for AI in business
OpenAI's Altman sees 2026 as a turning point for AI in business

Yahoo

time4 hours ago

  • Yahoo

OpenAI's Altman sees 2026 as a turning point for AI in business

STORY: :: June 2, 2025 :: San Francisco, California :: Sam Altman says 2026 will be a big year for AI solving problems and making discoveries :: Sam Altman, CEO, OpenAI 'I think we'll be at the point next year where you can not only use the system to sort of automate some business processes or fill these new products and services, but you can really say, I have this hugely important problem in my business. I will throw tons of compute at it if you can solve it. And the models will be able to go figure out things that teams of people on their own can't do." 'I would bet next year that in some limited cases, at least in some small ways, we start to see agents that can help us discover new knowledge or can figure out solutions to business problems that are kind of very nontrivial. Right now, it's very much in the category of, okay, if you got something like repetitive cognitive work, we can automate it at a kind of a low level on a short time horizon." 'So what an enterprise will be able to do, we talked about this a little bit, but just like give it your hardest problem if you're a chip design company, say go design me a better chip than I could have possibly had before. If you're a biotech company trying to cure some disease, so just go work on this for me. Like that's not so far away.' Speaking alongside Conviction founder Sarah Guo and Snowflake CEO Sridhar Ramaswamy, Altman said companies prepared to harness the full potential of AI will experience a 'step change' as models evolve from automating routine tasks to tackling non-trivial challenges. 'I would bet next year that, at least in some small ways, we start to see agents that can help us discover new knowledge,' Altman said, adding that future systems may significantly accelerate scientific discovery. Se produjo un error al recuperar la información Inicia sesión para acceder a tu portafolio Se produjo un error al recuperar la información Se produjo un error al recuperar la información Se produjo un error al recuperar la información Se produjo un error al recuperar la información

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store