Mint Explainer: Is OpenAI exaggerating the powers of its new ChatGPT Agent?

21 hours ago

Leslie D'Monte OpenAI has flagged the agent as high-risk under its safety framework. Is this just marketing hype or a sign that AI is genuinely becoming more powerful and autonomous? OpenAI CEO Sam Altman. Photo AFP
Gift this article
On Thursday, OpenAI launched its autonomous ChatGPT Agent, a tool that's capable of finding and buying things online, managing your calendar, and booking you an appointment with a doctor. It's essentially a digital assistant that doesn't just provide information but complete actual tasks.
On Thursday, OpenAI launched its autonomous ChatGPT Agent, a tool that's capable of finding and buying things online, managing your calendar, and booking you an appointment with a doctor. It's essentially a digital assistant that doesn't just provide information but complete actual tasks.
That being said, OpenAI has flagged the agent as high-risk under its safety framework, warning it could potentially be used to create dangerous biological or chemical substances. Is this just marketing hype, timed to build momentum for the launch of GPT-5, or a sign that AI agents are genuinely becoming more powerful and autonomous, akin to the agents who protect the computer-generated world of The Matrix? What is ChatGPT Agent?
Say you want to rearrange your calendar, find a doctor and schedule an appointment, or research competitors and deliver a report. ChatGPT Agent can now do it for you. Also Read | Deep research with AI is days' worth of work in minutes
The agent can browse websites, run code, analyse data, and even create slide decks or spreadsheets—all based on your instructions. It combines the strengths of OpenAI's earlier tools—operator (which could navigate the web) and deep research (which could analyse and summarise information)—into a single system. You stay in control throughout: ChatGPT asks for permission before doing anything important, and you can stop or take over at any time. This new capability is available to Pro, Plus, and Team users through the tools dropdown. How does it work?
ChatGPT Auses a powerful set of tools to complete tasks, including a visual browser to interact with websites like a human, a text-based browser for reasoning-heavy searches, a terminal for code execution, and direct application programming interface (API) access.
It can also connect to apps such as Gmail or GitHub to fetch relevant information. You can log in to websites within the agent's browser, allowing it to dig deeper into personalised content. All of this runs on its own virtual computer, which keeps track of context even across multiple tools.
The agent can switch between browsers, download and edit files, and adapt its methods to complete tasks quickly and accurately. It's built for back-and-forth collaboration—you can step in anytime to guide or change the task, and ChatGPT can ask for more input when needed. If a task takes time, you'll get updates and a notification on your phone once it's done. Has OpenAI tested its performance?
OpenAI said on Humanity's Last Exam (HLE), which tests expert-level reasoning across subjects, ChatGPT Agent achieved a new high score of 41.6, rising to 44.4 when multiple attempts were run in parallel and the most confident response was selected. On FrontierMath, the toughest known math benchmark, the agent scored 27.4% using tools such as a code-executing terminal—far ahead of previous models.
In real-world tasks, ChatGPT agent performs at or above human levels in about half of the cases, based on OpenAI's internal evaluations. These tasks include building financial models, analysing competitors, and identifying suitable sites for green hydrogen projects.
ChatGPT Agent also outperforms others on specialised tests such as DSBench for data science, and the SpreadsheetBench for spreadsheet editing (45.5% vs Copilot Excel's 20.0%). On BrowseComp and WebArena, which test browsing skills, the agent achieves the highest scores to date, according to OpenAI. What are some of the things it can do?
Consider the case of travel planning. The agent won't just suggest ideas but navigate booking websites, fill out forms, and even make reservations one you give it permission.
You can also ask it to read your emails, find meeting invitations, and automatically schedule appointments in your calendar, or even draft and send follow-up emails. This level of coordination typically required juggling between apps, but the agent manages it in a single conversational flow.
Another example involves shopping and price comparison. You can tell the agent to 'order the best-reviewed smartphone under ₹ 15,000", and it can search online stores, compare prices and reviews, and proceed to checkout on a preferred platform. Customer support and task automation are other examples, where the agent is used to troubleshoot an issue, log into support portals, and even file return or refund requests. How are AI agents typically built?
Unlike basic chat bots, AI agents are autonomous systems that can plan, reason, and complete complex, multi-step tasks with minimal input—such as coding, data analysis, or generating reports.
They are built by combining ways to take in information, think, and take action. Developers begin by deciding what the agent should do, following which the agent collects data like such as or images from its environment. AI agents use large language models (LLMs) like GPT-4 as their core 'brain", which allows them to understand and respond to natural language instructions.
To allow AI agents to take action, developers connect the LLM to things like a web browser, code editor, calculator, and APIs for services such as Gmail or Slack. Frameworks like LangChain help integrate these parts, and keep track of information. Some AI agents learn from experience and get better over time. Testing and careful setup make sure they work well and follow rules. Does ChatGPT Agent have credible competition?
Google's Project Astra, part of its Gemini AI line, is developing a multimodal assistant that can see, hear, and respond in real time. Gemini CLI is an open-source AI agent that brings Google's Gemini model directly to the terminal for fast, lightweight access. It integrates with Gemini Code Assist, offering developers on all plans AI-powered coding in both VS Code and the command line.
Microsoft is embedding Copilot into Windows, Office, and Teams, giving its agent access to workflows, system controls, and productivity tools, soon enhanced by a dedicated Copilot Runtime.
Meta is building more socially focused agents within messaging and the metaverse, which could evolve into utility tools.
Apple is revamping Siri through Apple Intelligence, combining GPT-level reasoning with strict privacy features and deep on-device integration.
Other smart agents include Oracle's Miracle Agent, IBM's Watson tools, Agentforce from Salesforce Anthropic's Claude 3.5, and Perplexity AI's action-oriented agents through its Comet project, blending search with agentic behaviour.
The competitive advantage, though, may go to companies that can integrate these AI agents into everyday applications and call for action with a single, unified tool – a task that ChatGPT Agent has demonstrated. Why did OpenAI warn that ChatGPT Agent could be used to trigger biological warfare?
OpenAI claimed ChatGPT Agent's superior capabilities could, in theory, be misused to help someone create dangerous biological or chemical substances. However, it clarified that there was no solid evidence it could actually do so.
Regardless, OpenAI is activating the highest level of safety measures under its internal 'preparedness framework'. These include thorough threat modeling to anticipate potential misuse, special training to ensure the model refuses harmful requests, and constant monitoring using automated systems that watch for risky behaviour. There are also clear procedures in place for suspicious activity. Should we take this risk seriously?
Ja-Nae Duane, AI expert and MIT Research Fellow and co-author of SuperShifts, said the more autonomous the agent, the more permissions and access rights it would require. For example, buying a dress requires wallet access; scheduling an event requires calendar and contact list access.
'While standard ChatGPT already presents privacy risks, the risks from ChatGPT Agent are exponentially higher because people will be granting it access rights to external tools containing personal information (like calendar, email, wallet, and more). There's a significant gap between the pace of AI development and AI literacy; many people haven't even fully understood ChatGPT's existing privacy risks, and now they're being introduced to a feature with exponentially more risks," he said. Also Read | Google's Veo 3 brings the era of video on command
Duane added that the key risks included data leaks, mistaken actions, prompt injection, and account compromise, especially when handling sensitive information. Malicious actors, he warned, could exploit them by manipulating inputs, abusing tool access, stealing credentials, or poisoning data to bias outputs. Poor third-party integration and an over-reliance of them could worsen the impact, while the agent's 'black box" nature would make it hard to trace errors, he added. In the wrong hands, these agents could be weaponised for fraud, phishing, or even to generate malware. What are the other concern areas for enterprises?
Developers are increasingly deploying AI agents across IT, customer service, and enterprise workflows. According to Nasscom, 46% of Indian firms are experimenting with these agents, particularly in IT, HR, and finance, while manufacturing leads in robotics, quality control, and automation.
Beyond concerns around hallucinations, security, privacy, and copyright or intellectual property (IP) violations, a key challenge for businesses is ensuring a return on investment. Gartner noted that many so-called agentic use cases could be handled by simpler tools and predicted that more than 40% of such projects would be scrapped by 2027 over high costs, unclear value, or inadequate risk controls.
Of the thousands of vendors in this space, only around 130 are seen as credible; many engage in 'agent washing" by repackaging chatbots, robotic process automation (RPA), or basic assistants as autonomous agents. Nasscom corroborated these concerns, highlighting that 62% of enterprises were still only testing agents in-house. Why is 'humans-in-the-loop' a must?
OpenAI CEO Sam Altman advised granting agents only the minimum access needed for each task, not blanket permissions. Nasscom believes that to scale responsibly, enterprises must prioritise human-AI collaboration, trust, and data readiness. It has recommended firms adopt AI agents with a 'human-in-the-loop" approach, reflecting the need for oversight and contextual judgment.
According to Duane, users must understand both the tool's strengths and its limits, especially when handling sensitive data. Caution is key, as misuse could have serious consequences. She also emphasised the importance of AI literacy, noting that AI was evolving far faster than most people's understanding of how to use it responsibly. Also Read | Mint Primer: Are firms wasting their money on AI agents? Topics You May Be Interested In

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

OpenAI's New AI Browser Poised to Challenge Google Chrome's Reign

Hans India

18 minutes ago

Hans India

OpenAI's New AI Browser Poised to Challenge Google Chrome's Reign

OpenAI is gearing up to launch its own AI-driven web browser in the coming weeks, aiming to redefine how people interact with the internet and take direct aim at Google Chrome's firm hold on the market. According to Reuters, this bold step marks OpenAI's expansion from chatbots and AI assistants into the core tool billions rely on to navigate the web daily. Unlike traditional browsers that mainly serve as a gateway to websites, OpenAI's upcoming browser promises to weave artificial intelligence directly into the browsing experience. The goal is to transform passive scrolling and clicking into an interactive, conversational process. Users may soon be able to book tickets, summarise articles, fill out forms, or complete everyday tasks without ever leaving an AI chat window within the browser. 'Weaving AI into the fabric of the browser means your experience feels less like hopping across tabs and more like having an assistant right by your side,' a source close to the development shared. The new browser will be built on Chromium — the same open-source foundation powering Google Chrome, Microsoft Edge, and Opera. This choice means users will be able to access the same websites and extensions they're used to, easing the transition for those willing to give OpenAI's alternative a shot. OpenAI has also recruited talent from Google's original Chrome team, highlighting just how serious it is about competing head-on. Beyond convenience, the move represents a significant data opportunity for OpenAI. For years, Chrome's massive user base has been an invaluable source of behavioural data for Google, helping the tech giant refine its ad targeting and cement its search engine as the default for billions worldwide. With its own browser, OpenAI will have a direct window into how people navigate and interact online — information that could further train its AI systems and make them even more personalised. The stakes are high. Google Chrome holds over two-thirds of the global browser market, with an estimated 3 billion users. It's not just a tool for browsing — it's a linchpin for Google's advertising empire and search engine dominance. But OpenAI's ChatGPT already boasts around 500 million weekly users. If just a fraction of these users switch to OpenAI's browser, Google could find its browser stronghold under real pressure for the first time in years. OpenAI's ambitions don't stop at browsers. The company recently acquired an AI hardware startup led by Jony Ive, Apple's former design chief, underscoring its push to blend software and devices. A browser gives OpenAI yet another platform to embed its AI agents deeply into everyday routines, handling tasks and gathering insights to make interactions smarter over time. The competition is heating up. AI-first browsers like Perplexity's Comet and Brave's enhanced AI tools are also vying to reimagine how people use the web. Whether OpenAI's new browser will lure people away from Chrome remains to be seen, but its arrival will almost certainly shake up a market long dominated by a few tech giants. With the launch just weeks away, the world is about to find out whether a more conversational, AI-powered browser can really change the way we surf the web.

Perplexity CEO says his AI browser will replace these two white collar jobs in every company

Indian Express

an hour ago

Indian Express

Perplexity CEO says his AI browser will replace these two white collar jobs in every company

Aravind Srinivas, CEO of Nvidia-backed startup Perplexity, has unveiled ambitions for the company's latest product, an AI-powered browser named Comet. Speaking on a recent episode of The Verge's 'Decoder' podcast, Srinivas claimed the tool is designed to automate substantial portions of work traditionally handled by recruiters and executive assistants, and can quickly replace them. Describing Comet as more than just a browser, Srinivas explained it's built to act as an intelligent assistant capable of handling complex workplace tasks autonomously. 'A recruiter's work worth one week is just one prompt: sourcing and reach outs,' he said in the podcast. Explaining the features of the browser, Srinivas shared that Comet's AI agent integrates directly with tools like Gmail, LinkedIn, and Google Calendar, allowing it to perform end-to-end recruiting tasks. It can create candidate shortlists, scrape contact information, and send out customised outreach emails. Further, he highlighted how Comet could take over many routine duties of an executive assistant, from email management to scheduling. Speaking to Business Insider, he explained, 'You want it to keep following up, keep a track of their responses.' He added that the AI is capable of updating spreadsheets, tracking communication status, handling follow-ups, resolving calendar conflicts, scheduling meetings, and even preparing briefings ahead of time. 'It can update the Google Sheets, mark the status as responded or in progress, and follow up with those candidates, sync with my Google calendar, and then resolve conflicts and schedule a chat, and then push me a brief ahead of the meeting,' he said. Srinivas envisions Comet evolving into a full-fledged 'AI operating system' for knowledge workers, capable of running in the background and autonomously executing a wide range of professional tasks. Srinivas revealed that Comet is still available to premium users and is invite-only. He believes that users will pay for AI that can perform quality work. 'At scale, if it helps you to make a few million bucks, does it not make sense to spend $2,000 for that prompt? It does, right?' he said.

Not using AI yet? Your career may already be falling behind, warns Perplexity CEO Aravind Srinivas

Time of India

2 hours ago

Time of India

Not using AI yet? Your career may already be falling behind, warns Perplexity CEO Aravind Srinivas

What does it really mean to be employable in 2025? For many students and young professionals, that question is no longer answered with just degrees or internships. It's increasingly about fluency in tools that didn't exist a year ago and skills that aren't taught in most classrooms. This is exactly the shift Aravind Srinivas, CEO of Perplexity, highlighted in his recent YouTube interview with Matthew Berman. The discussion quickly turned from innovation to urgency. In a digital economy where generative artificial intelligence (AI) is evolving every few months, Srinivas delivered a message that should echo across classrooms, boardrooms, and job portals: if you're not learning how to work with AI, you're falling behind. The frontier isn't theoretical anymore For young professionals entering the workforce, the AI revolution isn't just an abstract trend. It's reshaping what employers expect from new hires right now. Those who've mastered AI tools are already outpacing their peers in productivity, value, and hiring potential. "People who are really at the frontier of using AI are going to be way more employable than people who are not," Srinivas stated in the same video. He wasn't making a philosophical prediction; he was stating a professional reality. From resume screenings to content generation and project management, AI is quietly absorbing the kind of cognitive labour that once demanded human time and training, and it's not waiting for anyone to catch up. The paradox of fast tech and slow adoption While AI is racing ahead, most professionals are still figuring out the basics. The pace at which this technology is evolving means the skills gap is widening, not just because of access, but because of how hard it is to keep up emotionally and mentally. According to Srinivas, the real gap isn't just technological, it's psychological and systemic. Humanity, he pointed out, is inherently good at adaptation but not at speed. "We've never had a piece of technology evolve this fast," he remarked, highlighting how the breakneck pace of AI development is outstripping most people's capacity to keep up. This has a direct bearing on employability. "You can tell people, 'Hey, go learn AI, be more useful to your team.' But it takes a toll. People give up," he said. Educational resources are outdated by the time they gain traction and models get upgraded before manuals are printed. The result is a growing chasm between those who can speak the language of large language models and those still trying to locate the settings tab on ChatGPT. AI won't replace you but someone using AI will This now-familiar phrase was subtly woven into the Perplexity CEO's statement. The gap between those who can integrate AI into daily workflows and those who can't is becoming a real factor in promotions, hiring decisions, and team relevance. The threat, Srinivas suggests, isn't AI itself. It's the human who knows how to wield it and the solution isn't fear, it's fluency. He urges job seekers, students, and working professionals alike to see AI tools not as competition but as companions. "You need to be more useful to your team by being someone who can use AI and be faster and more efficient," he said. The employability edge, in 2025 and beyond, will be determined not by how hard you work but by how smartly you collaborate with machines. The emotional cost of staying updated Learning AI isn't just intellectually demanding, it's emotionally exhausting. For many professionals, especially those mid-career or switching industries, the fast-changing AI landscape creates burnout before results. Interestingly, Srinivas doesn't sugarcoat the emotional fatigue associated with this rapid shift. "Some people are going to lose jobs because this is beyond their limits," he said frankly. While that might sound bleak, he frames it as a challenge rather than a conclusion. He acknowledges that even content designed to educate is often rendered obsolete within a few product cycles. This makes it harder for average users to stay in the know. "Whatever educational materials you can build for people around the current state-of-the-art models becomes irrelevant like six months from now," he said. But his message isn't one of resignation, it's one of responsibility. New jobs will need new entrepreneurs Traditional job roles are getting disrupted, but the upside is clear. There's room to build. Srinivas points out that the next wave of employment may come not from job boards, but from people creating value in entirely new ways. While some leaders argue that AI-driven productivity could lead to more hiring, Srinivas is cautious. He agrees that teams who are hyper-productive with AI may attract more investment. However, this optimistic view assumes a large enough talent pool trained in the right tools. "The flaw in that argument is that it assumes there's always going to be a big supply of people who know how to use AI," he explained. So where do displaced workers go? According to Srinivas, one answer lies in entrepreneurship. "More entrepreneurs need to emerge to create new jobs," he said. Whether it's building new platforms or supporting AI-related services, the next wave of employment may be driven more by innovation than application. Browsers, agents, and a new skill economy The tasks being automated are often invisible. Things like summarising research, filling out forms, or sorting emails. These might seem small, but together they represent entire job categories and they're already being replaced. The conversation also touched on how AI is set to take over tasks so routine we rarely reflect on them: browsing, form-filling, email writing. As AI agents begin automating entire workflows, Srinivas pointed out that some labour types will become irrelevant. That shouldn't cause panic, he stressed, but preparation. "Spend less time doom-scrolling on Instagram. Spend more time using the AI," he advised. Not because platforms like Perplexity want more users, but because this is the only way to remain valuable in a workforce being reshaped in real time. The quiet urgency of learning fast At one point, Srinivas pointed out that, 'Most people are still stuck with GPT-4 on the default model,' suggesting that even those who are using AI tools may not be tapping into their full potential. 'I hope people try their best. That's all I can say,' he concluded. For students and job seekers navigating a shifting job market, the message is clear. The goal is not to fear the machine but to keep pace with it. As AI begins to absorb more cognitive tasks, staying employable will depend on how quickly one learns, adapts, and applies. In this new world of work, using AI isn't optional, it's fundamental. TOI Education is on WhatsApp now. Follow us here . Ready to navigate global policies? Secure your overseas future. Get expert guidance now!