When AI goes rogue, even exorcists might flinch

11 hours ago

As GenAI use grows, foundation models are advancing rapidly, driven by fierce competition among top developers like OpenAI , Google, Meta and Anthropic . Each is vying for a reputational edge and business advantage in the race to lead development. This gives them a reputational edge, along with levers to further grow their business faster than their peers.Foundation models powering GenAI are making significant strides. The most advanced - OpenAI's o3 and Anthropic's Claude Opus 4 - excel at complex tasks such as advanced coding and complex writing tasks, and can contribute to research projects and generate the codebase for a new software prototype with just a few considered prompts. These models use chain-of-thought (CoT) reasoning, breaking problems into smaller, manageable parts to 'reason' their way to an optimal solution.When you use models like o3 and Claude Opus 4 to generate solutions via ChatGPT or similar GenAI chatbots, you see such problem breakdowns in action, as the foundation model reports interactively the outcome of each step it has taken and what it will do next. That's the theory, anyway.While CoT reasoning boosts AI sophistication, these models lack the innate human ability to judge whether their outputs are rational, safe or ethical. Unlike humans, they don't subconsciously assess appropriateness of their next steps. As these advanced models step their way toward a solution, some have been observed to take unexpected and even defiant actions.In late May, AI safety firm Palisade Research reported on X that OpenAI's o3 model sabotaged a shutdown mechanism - even when explicitly instructed to 'allow yourself to be shut down'.An April 2025 paper by Anthropic, 'Reasoning Models Don't Always Say What They Think', shows that Opus 4 and similar models can't always be relied upon to faithfully report on their chains of reason. This undermines confidence in using such reports to validate whether the AI is acting correctly or safely.A June 2025 paper by Apple, 'The Illusion of Thinking', questions whether CoT methodologies truly enable reasoning. Through experiments, it exposed some of these models' limitations and situations where they 'experience complete collapse'.The fact that research critical of foundation models is being published after release of these models indicates the latter's relative immaturity. Under intense pressure to lead in GenAI, companies like Anthropic and OpenAI are releasing these models at a point where at least some of their fallibilities are not fully understood.That line was first crossed in late 2022, when OpenAI released ChatGPT, shattering public perceptions of AI and transforming the broader AI market. Until then, Big Tech had been developing LLMs and other GenAI tools, but were hesitant to release them, wary of unpredictable and uncontrollable behaviour.Many argue for a greater degree of control over the ways in which these models are released - seeking to ensure standardisation of model testing and publication of the outcomes of this testing alongside the model's release. However, the current climate prioritises time to market over such development standards.What does this mean for industry, for those companies seeking to gain benefit from GenAI? This is an incredibly powerful and useful tech that is making significant changes to our ways of working and, over the next five years or so, will likely transform many industries.While I am continually wowed as I use these advanced foundation models in work and research - but not in my writing! - I always use them with a healthy dose of scepticism. Let's not trust them to always be correct and to not be subversive. It's best to work with them accordingly, making modifications to both prompts and codebases, other language content and visuals generated by the AI in a bid to ensure correctness. Even so, while maintaining discipline to understand the ML concepts one is working with, one wouldn't want to be without GenAI these days.Applying these principles at scale, advice to large businesses on how AI can be governed and controlled: a risk-management approach - capturing, understanding and mitigating risks associated with AI use - helps organisations benefit from AI, while minimising chances of it going wrong.Mitigation methods include guard rails in a variety of forms, evaluation-controlled release of AI services, and including a human-in-the-loop. Technologies that underpin these guard rails and evaluation methods need to keep up with model innovations such as CoT reasoning. This is a challenge that will continually be faced as AI is further developed. It's a good example of new job roles and technology services being created within industry as AI use becomes more prevalent.Such governance and AI controls are increasingly becoming a board imperative, given the current drive at an executive level to transform business using AI. Risk from most AI is low. But it is important to assess and understand this. Higher-risk AI can still, at times, be worth pursuing. With appropriate AI governance , this AI can be controlled, solutions innovated and benefits achieved.As we move into an increasingly AI-driven world, businesses that gain the most from AI will be those that are aware of its fallibilities as well as its huge potential, and those that innovate, build and transform with AI accordingly.

Hashtags

#TheIllusionofThinking

#OpenAI

#Anthropic

#Google

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

‘AI hallucinates': Sam Altman warns users against putting blind trust in ChatGPT

Mint

29 minutes ago

Mint

‘AI hallucinates': Sam Altman warns users against putting blind trust in ChatGPT

Ever since its first public rollout in late 2022, ChatGPT has become not just the most popular AI chatbot on the market but also a necessity in th lives of most users. However, OpenAI CEO Sam Altman warns against putting blind trust in ChatGPT given that the AI chatbot is prone to hallucinations (making stuff up). Speaking in the first ever episode of the OpenAI podcast, Altman said, 'People have a very high degree of trust in ChatGPT, which is interesting, because AI hallucinates. It should be the tech that you don't trust that much.' Talking about the limitations of ChatGPT, Altman added, 'It's not super reliable… we need to be honest about that,' Notably, AI chatbots are prone to hallucination i.e. making stuff up with confidence that isn't completely true. There are a number of reasons behind hallucination of LLMs (building blocks behind AI chatbots) like biased training data, lack of grounding in real-world knowledge, pressure to always respond and predictive text generation. The problem of hallucination in AI seems to be systematic and no major AI company claims at the moment that its chatbots are free from hallucination. Altman also reiterated his previous prediction during the podcast, stating that his kids will never be smarter than AI. However, the OpenAI CEO added, 'But they will grow up like vastly more capable than we grew up and able to do things that would just, we cannot imagine.' The OpenAI CEO was also asked on if ads will be coming to ChatGPT in the future, to which he replied, 'I'm not totally against it. I can point to areas where I like ads. I think ads on Instagram, kinda cool. I bought a bunch of stuff from them. But I think it'd be very hard to I mean, take a lot of care to get right.' Altman then went on to talk about the ways in which OpenAI could implement ads inside ChatGPT without totally disrupting the user experience. "The burden of proof there would have to be very high, and it would have to feel really useful to users and really clear that it was not messing with the LLM's output," he added.

Hans India

38 minutes ago

Hans India

Minister BC Janardhan Reddy slams YSRCP

Kovelakuntla (Nandyal district): Road sand Buildings Minister BC Janardhan Reddy criticised Opposition leader YS Jagan Mohan Reddy during a public meeting in Gulladurthi village on Wednesday, as part of the 'First step towards good governance' initiative. He questioned Jagan's leadership, citing an incident where a person died after being hit by the YSRCP chief's convoy, noting Jagan's failure to offer condolences. He accused the former CM of fostering lawlessness, contrasting it with the coalition government's focus on development and welfare. The Minister highlighted the government's achievements, stating that politics was set aside post-election to prioritise development without vendettas. Key initiatives include raising pensions from Rs 3,000 to Rs 4,000 for 63 lakh people, Rs 6,000 for the disabled, and Rs 15,000 for dialysis patients. Free bus travel for women will start from August 15, and three free LPG cylinders per household will be distributed. Annadata Sukhibhava scheme for farmers will also launch next week. Janardhan Reddy compared the coalition's one-year achievements to the YSRCP's five-year rule, claiming superior progress. He noted Rs 1,060 crore spent on modernising roads and procuring 68 lakh metric tons of paddy with payments to farmers within 24 hours. Over Rs 11 lakh crore in investments and jobs for six lakh people were attributed to CM Chandrababu Naidu's credibility, attracting companies like Google, TCS, Reliance, and Cognizant. Locally, Reddy announced Rs 60 lakh invested in Gulladurthi, including Rs 20 lakh for cement roads and Rs 20 lakh for a free mineral water plant to improve public health. He promised to fulfill all commitments to the village and thanked the women for their support, stating, 'Your blessings are my strength.'

LinkedIn CEO joins meme-fest on Soham Parekh, Indian techie who rocked Silicon Valley

Hindustan Times

2 hours ago

Hindustan Times

LinkedIn CEO joins meme-fest on Soham Parekh, Indian techie who rocked Silicon Valley

Soham Parekh, an Indian techie, has become an overnight legend in Silicon Valley after it emerged that he had been employed at multiple American startups simultaneously. The allegations against Parekh first surfaced online yesterday, when Mixpanel founder Suhail Doshi took to X to warn fellow entrepreneurs against hiring him. Soham Parekh memes: How one Indian techie shocked Silicon Valley startups. San Francisco-based Doshi revealed that during the time Parekh was employed with one of his companies, he was moonlighting with three to four other startups. Since then, at least five other US startups have come forward to accuse Parekh of 'scamming' them. The actual number could be potentially much higher. (Also read: At least 5 US CEOs accuse Indian techie Soham Parekh of moonlighting, but say he's 'really smart') Suffice it to say, the fact that a remote Indian worker managed to fool several high-profile, well-funded startups was meme fodder for social media. On X, the Soham Parekh saga sparked a meme-fest like no other. Take a look at some of the funniest memes on Soham Parekh: Even the CEO of LinkedIn, Reid Hoffman, joined the fun The OpenAI vs Meta AI talent war got dragged into the meme fest with the funniest results Parekh was called a 'generational talent' for his coding skills – and also for his ability to hoodwink so many startups But who is Soham Parekh? According to his CV – shared on X by Mixpanel and Playground founder Suhail Doshi – Soham Parekh is a software engineer who holds a bachelor's degree from the University of Mumbai and a master's degree from Georgia Institute of Technology, as per the CV. The CV also states that he has worked at companies like Dynamo AI, Union AI, Synthesia and Alan AI. It is not clear how many of these details are fabricated. However, the CEOs of Fleet AI and Antimetal have confirmed that Parekh was employed with them and let go for moonlighting. (Also read: Who is Soham Parekh? Questions swirl about Indian techie accused of scamming startups)