logo
#

Latest news with #ChatGPTo3

‘What just happened?': IIT Kharagpur student tests ChatGPT o3 on JEE Advanced mock test, taken aback by results
‘What just happened?': IIT Kharagpur student tests ChatGPT o3 on JEE Advanced mock test, taken aback by results

Indian Express

time8 hours ago

  • Science
  • Indian Express

‘What just happened?': IIT Kharagpur student tests ChatGPT o3 on JEE Advanced mock test, taken aback by results

From humanoid robots to self-driving cars to offering relationship advice, artificial intelligence (AI) has become an integral part of several industries, and Sam Altman's ChatGPT has been making waves for quite some time. With multiple new versions, the platform is being continuously refined, impacting professionals across various fields. Recently, an IIT Kharagpur student tested ChatGPT o3 during her JEE Advanced 2025 mock test, and the results were shocking. In a blog post on software platform Heltar, Anushka Aashvi revealed that the model scored an astonishing 327 out of 360, a result that would have secured All India Rank 4 in the real exam. Titled 'ChatGPT o3 Scores AIR 4 in JEE Advanced 2025. What Just Happened?', Aashvi shared that she went to great lengths to create a credible exam situation. The model was directed to 'act like a JEE aspirant,' solving each question separately with no internet access and no memory from previous answers. Every question was solved in a fresh chat session to prevent any form of carryover learning. 'We tested the ChatGPT o3 model (which was released on 16th April 2025) on the JEE Advanced 2025 question paper which was conducted on 18th May to ensure that the questions have as much newness for the AI model as possible,' Aashvi wrote. Despite these constraints, ChatGPT o3 impressed at nearly every step. The platform helped her achieve perfect scores in Chemistry and Mathematics during the second half of the paper, and she lost only a few marks in Physics. The model showed a clear, step-by-step reasoning process, approaching multi-concept questions, advanced calculus problems, and even skeletal chemical diagrams. 'It easily solved lengthy algebra and calculus problems. The model performed remarkably well at combining concepts from multiple chapters to reach a correct solution. It was even able to interpret compounds correctly from their skeletal formulae and solve them correctly,' the student wrote in the blog. However, ChatGPT o3 did struggle with certain visual and instrument-based questions. Aashvi shared that it failed to accurately interpret a Vernier scale and took nearly 10 minutes to answer a graphical question, only to get it wrong. 'It was not able to understand the Vernier Scale readings. It kept reiterating to get to the solution but took very long and even then gave the wrong answer,' she wrote.

Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity
Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity

India Today

time9 hours ago

  • India Today

Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity

They may talk the talk, but can they truly think it through? A new study by Apple researchers suggests that even the most advanced AI models like ChatGPT o3, Claude, and DeepSeek start to unravel when the going gets tough. These so-called 'reasoning' models may impress with confident answers and detailed explanations, but when faced with genuinely complex problems, they stumble – and sometimes fall flat. advertisementApple researchers have found that the most advanced large language models today may not be reasoning in the way many believe. In a recently released paper titled The Illusion of Thinking, researchers at Apple show that while these models appear intelligent on the surface, their performance dramatically collapses when they are faced with truly complex study looked at a class of models now referred to as Large Reasoning Models (LRMs), which are designed to "think" through complex tasks using a series of internal steps, often called a 'chain of thought.' This includes models like OpenAI's o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking. Apple's researchers tested how these models handle problems of increasing difficulty – not just whether they arrive at the correct answer, but how they reason their way The findings were striking. As problem complexity rose, the models' performance did not apparently degrade gracefully – it collapsed completely. 'They think more up to a point,' tweeted tech critique Josh Wolfe, referring to the findings. 'Then they give up early, even when they have plenty of compute left.' Apple's team built custom puzzle environments such as the Tower of Hanoi, River Crossing, and Blocks World to carefully control complexity levels. These setups allowed them to observe not only whether the models found the right answer, but how they tried to get found that:-At low complexity, traditional LLMs (without reasoning chains) performed better and were more efficient-At medium complexity, reasoning models briefly took the lead-At high complexity, both types failed completelyEven when given a step-by-step algorithm for solving a problem, so that they only needed to follow instructions, models still made critical mistakes. This suggests that they struggle not only with creativity or problem-solving, but with basic logical execution. The models also showed odd behaviour when it came to how much effort they put in. Initially, they 'thought' more as the problems got harder, using more tokens for reasoning steps. But once a certain threshold was reached, they abruptly started thinking less. This happened even when they hadn't hit any computational limits, highlighting what Apple calls a 'fundamental inference time scaling limitation.'advertisementCognitive scientist Gary Marcus said the paper supports what he's been arguing for decades: these systems don't generalise well beyond their training data. 'Neural networks can generalise within a training distribution of data they are exposed to, but their generalisation tends to break down outside that distribution,' Marcus wrote on Substack. He also noted that the models' 'reasoning traces' – the steps they take to reach an answer – can look convincing, but often don't reflect what the models actually did to reach a State University's Subbarao (Rao) Kambhampati, whose previous work has critiqued so-called reasoning models, was also echoed in Apple's findings, points out Marcus. Rao has shown that models often appear to think logically but actually produce answers that don't match their thought process. Apple's experiments back this up by showing models generate long reasoning paths that still lead to the wrong answer, particularly as problems get the most damning evidence came when Apple tested whether models could follow exact instructions. In one test, they were handed the algorithm to solve the Tower of Hanoi puzzle and asked to just execute it. The models still failed once the puzzle complexity passed a certain conclusion is blunt: today's top models are 'super expensive pattern matchers' that can mimic reasoning only within familiar settings. The moment they're faced with novel problems – ones just outside their training data – they findings have serious implications for claims that AI is becoming capable of human-like reasoning. As the paper puts it, the current approach may be hitting a wall, and overcoming it could require an entirely different way of thinking about how we build intelligent systems. In short, we are still leaps away from AGI.

What is agentic AI and why is everyone talking about it?
What is agentic AI and why is everyone talking about it?

Yahoo

time20-05-2025

  • Yahoo

What is agentic AI and why is everyone talking about it?

According to the AI overlords, this is the year of agentic AI. You may have seen Google announce its "agentic era" with a web browsing research assistant and an AI bot that calls nail salons and mechanics for you. OpenAI leadership talked about agentic AI being a "big theme in 2025" and has already introduced a research preview of Operator, an agent that can perform tasks on your behalf, and Deep Research, which "conducts multi-step research on the internet for complex tasks." Microsoft just unveiled Microsoft Discover, an enterprise agentic AI tool for scientists. And your next smartphone could have agentic features that can send custom messages, create calendar events, or pull together information from across different apps. If you've been nodding and smiling every time one of your tech friends mentions agentic AI, don't be embarrassed. This is a new entry in the AI glossary, but one that can no longer be ignored. "Agentic AI refers to a class of artificial intelligence systems designed to operate autonomously, perceive their environment, set goals, plan actions to achieve those goals, and execute those plans without continuous human intervention. These systems can learn and adapt over time based on feedback and new information." That's according to — what else? — Google's AI chatbot Gemini. Unlike generative AI, which is essentially a tool for creating some kind of output — code, text, audio, images, videos — agentic AI can autonomously perform tasks on a user's behalf. This is a step up from the standard AI chatbot experience. Instead of generating a response based on its training material, agentic AI can take additional steps, such as conducting internet searches and analyzing the results, consulting additional sources, or completing a task in another app or software. You may have heard this term used interchangeably with AI agents, but agentic AI is a broader term that encompasses technology that may not be fully autonomous but has some agent-like capabilities. So, OpenAI considers Operator an AI agent because it has contextual awareness and can perform tasks for you like sending text messages. And its Deep Research tool is agentic AI because it can autonomously crawl the web and compile a report for the user, though its capabilities pretty much stop there for now. Agentic AI is powered by more advanced reasoning models like ChatGPT o3 and Gemini 2.5 Pro Preview, which can break down complex tasks and make inferences. This brings large-language models like ChatGPT one step closer to mimicking how the human brain works. Unless you constantly retrain a generative AI model with new information, it can't learn new things, said Karen Panetta, IEEE Fellow and professor of engineering at Tufts University. "This other kind of AI can learn from seeing other examples, and it can be more autonomous in breaking down tasks and helping you with more goal-driven types of activities, versus more exploratory or giving back information." When combined with computer vision, which is what allows a model to "see" a user's computer screen, we get the agentic AI everyone is so excited about. Google's new AI shopping experience could utilize agentic AI to make purchases on your behalf. Credit: Google Agentic AI is not entirely new. Self-driving cars and robot vacuums could both be considered early examples of agentic AI. They're technologies with autonomous properties that rely on advanced sensors and cameras to make sense of their environment and react accordingly. But agentic AI is having its moment now for a few reasons. Crucially, the latest models have gotten better and more user-friendly (although sometimes too friendly). And as people begin to rely on AI chatbots like ChatGPT, there's a growing interest in using these tools to automate daily tasks like responding to emails. With agentic AI, you don't need to be a computer programmer to use ChatGPT for automation. You can simply tell the chatbot what to do in plain English and have it carry out your instructions. At least, that's the idea. Companies like OpenAI, Google, and Anthropic are banking on agentic AI because it has the potential to move the technology beyond the novelty chatbot experience. With agentic AI, tools like ChatGPT could become truly indispensable for businesses and individuals alike. Agentic AI tools could order groceries online, browse and buy the best-reviewed espresso machine for you, or even research and book vacations. In fact, Google is already taking steps in this direction with its new AI shopping experience. In the business world, companies are looking to agentic AI to resolve customer service inquiries and adjust stock trading strategies in real-time. Are there risks involved with unleashing autonomous bots in the wild? Why, yes. With an agent operating on your behalf, there's always a risk of it sending a sensitive email to the wrong person or accidentally making a huge purchase. And then there's the question of liability. "Am I going to be sued because I went and had my agent do something?" Panetta wondered. "Say I'm working as an officer of something, and I use an AI agent to make a decision, to help us do our planning, and then you lose that organization money." The major AI players have put safeguards in place to prevent AI agents from going rogue, such as requiring human supervision or approval for sensitive tasks. OpenAI says Operator won't take screenshots when it's in human override mode, and it doesn't currently allow its agent to make banking transactions. But what about when the technology becomes more commonplace? As we become more comfortable with agentic AI, will we become more passive and lax about oversight? Earlier in this article, we used Google Gemini to help define agentic AI. If we become dependent on AI tools for even simple learning, will human beings get dumber? Then there's the extensive data access we have to give agents. Sure, it would be convenient for ChatGPT to automatically filter, sort, or even delete emails. But do you want to give an AI company full access to every email you've ever sent or received? And what about bad actors that don't have such safeguards in place? Panetta warns of increasingly sophisticated cyberattacks utilizing agentic AI. "Because the access to powerful computing now is so cheap, that means that the bad actors have access to it," she said. "They can be running simulations and being able to come up with sophisticated schemes to break into your systems or connive you into taking out this equity loan." AI has always been a double-edged sword, with equally potent harms and benefits. And with agentic AI getting ready for primetime deployment, the stakes are getting higher. Disclosure: Ziff Davis, Mashable's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store