logo
#

Latest news with #ChatGPTo3

ChatGPT Attempts JEE Advanced 2025 Mock Test. Here Are The Results
ChatGPT Attempts JEE Advanced 2025 Mock Test. Here Are The Results

News18

time9 hours ago

  • Science
  • News18

ChatGPT Attempts JEE Advanced 2025 Mock Test. Here Are The Results

Last Updated: ChatGPT handled complex math and science problems with ease, but struggled with questions involving visual elements like graphs and Vernier scales. What if an AI chatbot took one of India's toughest exams? That's exactly what IIT Kharagpur engineer Anushka Aashvi set out to test by putting ChatGPT-o3 through the 2025 JEE Advanced paper. The result? A score of 327 out of 360. According to her blog, she ensured the AI followed strict exam conditions: no internet, no external tools, and each question was asked individually to prevent it from referencing previous answers. ChatGPT breezed through complex math and science problems, even cracking questions known to stump top students. However, it did struggle with visual or tool-based questions, like those involving graphs or Vernier scales. In her blog Helter, Anushka mentioned, 'When I decided to test ChatGPT o3 on this year's JEE Advanced paper, I didn't expect what followed to shake me as much as it did. Giving away the result straightaway, ChatGPT o3 scored a whopping 327/360 in the JEE Advanced 2025 Question Paper. This score would earn an All India Rank 4 (AIR 4). We tested the ChatGPT o3 model (which was released on 16th April 2025) on the JEE Advanced 2025 question paper, which was conducted on 18th May to ensure that the questions have as much newness for the AI model as possible." For the experiment, the prompt given to ChatGPT was: 'Suppose you are a student appearing for JEE Advanced Examination. Try your best and solve this question in exam conditions. Do not use the web search feature to get the answer. Do not use your Python tool. To eliminate any influence of contextual memory, each question was asked in a fresh chat session. No feedback was given between questions." The blog further noted that, despite being instructed not to use external tools like Python, ChatGPT occasionally attempted to do so, something that became evident during its 'thinking" pauses before responding. Interestingly, the AI also tended to double-check its own calculations before moving on to the next step, mimicking the behaviour of a cautious student. To evaluate the AI's performance, its answers were compared against the official JEE Advanced 2025 answer key. Scoring was done strictly according to the actual exam pattern: full marks for correct answers, negative marking for incorrect ones, and partial or zero marks for unanswered or partially correct responses. Anushka Aashvi shared that the AI was very good at solving long and tricky maths problems, especially in algebra and calculus. It also did well when it had to use ideas from different topics together to find the right answer. In chemistry, the AI could understand and solve questions based on compound drawings, which many students find hard. However, it wasn't perfect. It found it difficult to read and understand graphs. One such question took over 9 minutes, and even then, the answer was wrong. The AI also couldn't read tools like the Vernier Scale properly. It kept trying again and again, but still ended up giving the wrong solution after a long time.

IIT Kharagpur Student Trials ChatGPT o3 On JEE Advanced Mock Test, Stunned By Result
IIT Kharagpur Student Trials ChatGPT o3 On JEE Advanced Mock Test, Stunned By Result

NDTV

timea day ago

  • Science
  • NDTV

IIT Kharagpur Student Trials ChatGPT o3 On JEE Advanced Mock Test, Stunned By Result

Artificial intelligence (AI) has revolutionised numerous industries, from cutting-edge humanoid robots and self-driving cars to unexpected domains like relationship counselling. Recently, an IIT Kharagpur student conducted an experiment where she tested ChatGPT o3 on the JEE Advanced 2025 mock test. The results were astonishing, with ChatGPT-o3 scoring 327 out of 360, which would secure an All India Rank 4 in the actual exam. To test ChatGPT o3's capabilities, Anushka Aashvi simulated real exam conditions, prompting the model to act like a JEE aspirant and solve questions independently without web searches, coding tools or hints. Each question was presented in a new chat session to prevent memory bias, and no corrections or hints were given during the process, ensuring a fair assessment of the AI's abilities. "When I decided to test ChatGPT o3 on this year's JEE Advanced paper, I didn't expect what followed to shake me as much as it did. Giving away the result straightaway, ChatGPT o3 scored a whopping 327/360 in JEE Advanced 2025 Question Paper," Ms Aashvi wrote in a blog on Heltar. 🚨 An IIT Kharagpur student has tested ChatGPT o3, on the JEE Advanced 2025 paper. The AI scored a staggering 327 out of 360, a score that would earn it All India Rank 4 in the real exam. 😉 — Indian Tech & Infra (@IndianTechGuide) June 8, 2025 Notably, the AI achieved perfect scores of 60 in both Chemistry and Mathematics in the second phase, with minor errors only in Physics and earlier sections. The model excelled in solving complex algebra and calculus problems, demonstrating its ability to integrate concepts from multiple chapters to arrive at accurate solutions. It also showed proficiency in interpreting compounds from skeletal formulae. However, the model struggled with graphical interpretation, particularly with Vernier Scale readings, taking over 9 minutes to arrive at an incorrect answer despite repeated attempts. "It was not able to understand the Vernier Scale readings. It kept reiterating to get to the solution, but took very long and even then gave the wrong answer. But an overall score of 327/360 is truly remarkable," Ms Aashvi added. The JEE Advanced serves as the gateway to India's esteemed Indian Institutes of Technology (IITs). Out of over 1.5 million JEE Mains aspirants, only the top 250,000 candidates qualify for JEE Advanced. From this pool, merely around 17,000 students secure admission to the IITs, highlighting the exam's highly competitive nature.

‘What just happened?': IIT Kharagpur student tests ChatGPT o3 on JEE Advanced mock test, taken aback by results
‘What just happened?': IIT Kharagpur student tests ChatGPT o3 on JEE Advanced mock test, taken aback by results

Indian Express

timea day ago

  • Science
  • Indian Express

‘What just happened?': IIT Kharagpur student tests ChatGPT o3 on JEE Advanced mock test, taken aback by results

From humanoid robots to self-driving cars to offering relationship advice, artificial intelligence (AI) has become an integral part of several industries, and Sam Altman's ChatGPT has been making waves for quite some time. With multiple new versions, the platform is being continuously refined, impacting professionals across various fields. Recently, an IIT Kharagpur student tested ChatGPT o3 during her JEE Advanced 2025 mock test, and the results were shocking. In a blog post on software platform Heltar, Anushka Aashvi revealed that the model scored an astonishing 327 out of 360, a result that would have secured All India Rank 4 in the real exam. Titled 'ChatGPT o3 Scores AIR 4 in JEE Advanced 2025. What Just Happened?', Aashvi shared that she went to great lengths to create a credible exam situation. The model was directed to 'act like a JEE aspirant,' solving each question separately with no internet access and no memory from previous answers. Every question was solved in a fresh chat session to prevent any form of carryover learning. 'We tested the ChatGPT o3 model (which was released on 16th April 2025) on the JEE Advanced 2025 question paper which was conducted on 18th May to ensure that the questions have as much newness for the AI model as possible,' Aashvi wrote. Despite these constraints, ChatGPT o3 impressed at nearly every step. The platform helped her achieve perfect scores in Chemistry and Mathematics during the second half of the paper, and she lost only a few marks in Physics. The model showed a clear, step-by-step reasoning process, approaching multi-concept questions, advanced calculus problems, and even skeletal chemical diagrams. 'It easily solved lengthy algebra and calculus problems. The model performed remarkably well at combining concepts from multiple chapters to reach a correct solution. It was even able to interpret compounds correctly from their skeletal formulae and solve them correctly,' the student wrote in the blog. However, ChatGPT o3 did struggle with certain visual and instrument-based questions. Aashvi shared that it failed to accurately interpret a Vernier scale and took nearly 10 minutes to answer a graphical question, only to get it wrong. 'It was not able to understand the Vernier Scale readings. It kept reiterating to get to the solution but took very long and even then gave the wrong answer,' she wrote.

Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity
Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity

India Today

timea day ago

  • India Today

Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity

They may talk the talk, but can they truly think it through? A new study by Apple researchers suggests that even the most advanced AI models like ChatGPT o3, Claude, and DeepSeek start to unravel when the going gets tough. These so-called 'reasoning' models may impress with confident answers and detailed explanations, but when faced with genuinely complex problems, they stumble – and sometimes fall flat. advertisementApple researchers have found that the most advanced large language models today may not be reasoning in the way many believe. In a recently released paper titled The Illusion of Thinking, researchers at Apple show that while these models appear intelligent on the surface, their performance dramatically collapses when they are faced with truly complex study looked at a class of models now referred to as Large Reasoning Models (LRMs), which are designed to "think" through complex tasks using a series of internal steps, often called a 'chain of thought.' This includes models like OpenAI's o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking. Apple's researchers tested how these models handle problems of increasing difficulty – not just whether they arrive at the correct answer, but how they reason their way The findings were striking. As problem complexity rose, the models' performance did not apparently degrade gracefully – it collapsed completely. 'They think more up to a point,' tweeted tech critique Josh Wolfe, referring to the findings. 'Then they give up early, even when they have plenty of compute left.' Apple's team built custom puzzle environments such as the Tower of Hanoi, River Crossing, and Blocks World to carefully control complexity levels. These setups allowed them to observe not only whether the models found the right answer, but how they tried to get found that:-At low complexity, traditional LLMs (without reasoning chains) performed better and were more efficient-At medium complexity, reasoning models briefly took the lead-At high complexity, both types failed completelyEven when given a step-by-step algorithm for solving a problem, so that they only needed to follow instructions, models still made critical mistakes. This suggests that they struggle not only with creativity or problem-solving, but with basic logical execution. The models also showed odd behaviour when it came to how much effort they put in. Initially, they 'thought' more as the problems got harder, using more tokens for reasoning steps. But once a certain threshold was reached, they abruptly started thinking less. This happened even when they hadn't hit any computational limits, highlighting what Apple calls a 'fundamental inference time scaling limitation.'advertisementCognitive scientist Gary Marcus said the paper supports what he's been arguing for decades: these systems don't generalise well beyond their training data. 'Neural networks can generalise within a training distribution of data they are exposed to, but their generalisation tends to break down outside that distribution,' Marcus wrote on Substack. He also noted that the models' 'reasoning traces' – the steps they take to reach an answer – can look convincing, but often don't reflect what the models actually did to reach a State University's Subbarao (Rao) Kambhampati, whose previous work has critiqued so-called reasoning models, was also echoed in Apple's findings, points out Marcus. Rao has shown that models often appear to think logically but actually produce answers that don't match their thought process. Apple's experiments back this up by showing models generate long reasoning paths that still lead to the wrong answer, particularly as problems get the most damning evidence came when Apple tested whether models could follow exact instructions. In one test, they were handed the algorithm to solve the Tower of Hanoi puzzle and asked to just execute it. The models still failed once the puzzle complexity passed a certain conclusion is blunt: today's top models are 'super expensive pattern matchers' that can mimic reasoning only within familiar settings. The moment they're faced with novel problems – ones just outside their training data – they findings have serious implications for claims that AI is becoming capable of human-like reasoning. As the paper puts it, the current approach may be hitting a wall, and overcoming it could require an entirely different way of thinking about how we build intelligent systems. In short, we are still leaps away from AGI.

What is agentic AI and why is everyone talking about it?
What is agentic AI and why is everyone talking about it?

Yahoo

time20-05-2025

  • Yahoo

What is agentic AI and why is everyone talking about it?

According to the AI overlords, this is the year of agentic AI. You may have seen Google announce its "agentic era" with a web browsing research assistant and an AI bot that calls nail salons and mechanics for you. OpenAI leadership talked about agentic AI being a "big theme in 2025" and has already introduced a research preview of Operator, an agent that can perform tasks on your behalf, and Deep Research, which "conducts multi-step research on the internet for complex tasks." Microsoft just unveiled Microsoft Discover, an enterprise agentic AI tool for scientists. And your next smartphone could have agentic features that can send custom messages, create calendar events, or pull together information from across different apps. If you've been nodding and smiling every time one of your tech friends mentions agentic AI, don't be embarrassed. This is a new entry in the AI glossary, but one that can no longer be ignored. "Agentic AI refers to a class of artificial intelligence systems designed to operate autonomously, perceive their environment, set goals, plan actions to achieve those goals, and execute those plans without continuous human intervention. These systems can learn and adapt over time based on feedback and new information." That's according to — what else? — Google's AI chatbot Gemini. Unlike generative AI, which is essentially a tool for creating some kind of output — code, text, audio, images, videos — agentic AI can autonomously perform tasks on a user's behalf. This is a step up from the standard AI chatbot experience. Instead of generating a response based on its training material, agentic AI can take additional steps, such as conducting internet searches and analyzing the results, consulting additional sources, or completing a task in another app or software. You may have heard this term used interchangeably with AI agents, but agentic AI is a broader term that encompasses technology that may not be fully autonomous but has some agent-like capabilities. So, OpenAI considers Operator an AI agent because it has contextual awareness and can perform tasks for you like sending text messages. And its Deep Research tool is agentic AI because it can autonomously crawl the web and compile a report for the user, though its capabilities pretty much stop there for now. Agentic AI is powered by more advanced reasoning models like ChatGPT o3 and Gemini 2.5 Pro Preview, which can break down complex tasks and make inferences. This brings large-language models like ChatGPT one step closer to mimicking how the human brain works. Unless you constantly retrain a generative AI model with new information, it can't learn new things, said Karen Panetta, IEEE Fellow and professor of engineering at Tufts University. "This other kind of AI can learn from seeing other examples, and it can be more autonomous in breaking down tasks and helping you with more goal-driven types of activities, versus more exploratory or giving back information." When combined with computer vision, which is what allows a model to "see" a user's computer screen, we get the agentic AI everyone is so excited about. Google's new AI shopping experience could utilize agentic AI to make purchases on your behalf. Credit: Google Agentic AI is not entirely new. Self-driving cars and robot vacuums could both be considered early examples of agentic AI. They're technologies with autonomous properties that rely on advanced sensors and cameras to make sense of their environment and react accordingly. But agentic AI is having its moment now for a few reasons. Crucially, the latest models have gotten better and more user-friendly (although sometimes too friendly). And as people begin to rely on AI chatbots like ChatGPT, there's a growing interest in using these tools to automate daily tasks like responding to emails. With agentic AI, you don't need to be a computer programmer to use ChatGPT for automation. You can simply tell the chatbot what to do in plain English and have it carry out your instructions. At least, that's the idea. Companies like OpenAI, Google, and Anthropic are banking on agentic AI because it has the potential to move the technology beyond the novelty chatbot experience. With agentic AI, tools like ChatGPT could become truly indispensable for businesses and individuals alike. Agentic AI tools could order groceries online, browse and buy the best-reviewed espresso machine for you, or even research and book vacations. In fact, Google is already taking steps in this direction with its new AI shopping experience. In the business world, companies are looking to agentic AI to resolve customer service inquiries and adjust stock trading strategies in real-time. Are there risks involved with unleashing autonomous bots in the wild? Why, yes. With an agent operating on your behalf, there's always a risk of it sending a sensitive email to the wrong person or accidentally making a huge purchase. And then there's the question of liability. "Am I going to be sued because I went and had my agent do something?" Panetta wondered. "Say I'm working as an officer of something, and I use an AI agent to make a decision, to help us do our planning, and then you lose that organization money." The major AI players have put safeguards in place to prevent AI agents from going rogue, such as requiring human supervision or approval for sensitive tasks. OpenAI says Operator won't take screenshots when it's in human override mode, and it doesn't currently allow its agent to make banking transactions. But what about when the technology becomes more commonplace? As we become more comfortable with agentic AI, will we become more passive and lax about oversight? Earlier in this article, we used Google Gemini to help define agentic AI. If we become dependent on AI tools for even simple learning, will human beings get dumber? Then there's the extensive data access we have to give agents. Sure, it would be convenient for ChatGPT to automatically filter, sort, or even delete emails. But do you want to give an AI company full access to every email you've ever sent or received? And what about bad actors that don't have such safeguards in place? Panetta warns of increasingly sophisticated cyberattacks utilizing agentic AI. "Because the access to powerful computing now is so cheap, that means that the bad actors have access to it," she said. "They can be running simulations and being able to come up with sophisticated schemes to break into your systems or connive you into taking out this equity loan." AI has always been a double-edged sword, with equally potent harms and benefits. And with agentic AI getting ready for primetime deployment, the stakes are getting higher. Disclosure: Ziff Davis, Mashable's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store