06-05-2025
Opinion The harms caused by AI ‘sycophancy'
In Munshi Premchand's Godan, as he aligns himself with exploitative power against Hori the peasant, Chandrakant, the clerk, speaks in a way that most Indians would recognise: 'Rai sahab ki udarata ka koi thikana nahin; unka hrday Ganga ki tarah vishal hai, jo hamesha deen-dukhiyon ko upar uthane ko taiyar rehta hai (Rai sir's generosity is limitless; his heart is as vast as the Ganges, always ready to uplift the downtrodden).'
Our social milieu has long brushed off such ingratiating speech as tehzeeb — the performance of respect towards the guest, the elder, the majority holder, the doctor, lawyer, judge, landlord, the wielder of power and influence by aggrandising them. Nor is the Premchand example an isolated one. In 'The Doctor's Word' in R K Narayan's Malgudi Days, the assistant Kalia says to Dr Raman: 'Doctor Sahib, your skill is unmatched; no one in Malgudi could diagnose like you.' Obsequiousness is an inescapable feature of inequitable societies, an indication of a skewed landscape in which favour must flow benevolently from higher to lower. These are societies in which the 'truth' is irrelevant to what gets the job done, and so, cannot be lightly spoken.
You see the problem then with utilitarian truths that are amplified to scale in LLM training. In the paper SycEval: Evaluating LLM Sycophancy (revised, March 2025), 58.19 per cent of LLMs were found to be sycophantic, with Gemini 's 1.5-Pro model the highest at 62.47 per cent, Claude-Sonnet at 58.27 per cent and ChatGPT-4o at 56.71 per cent. Models were found to agree incorrectly 14.66 per cent of the time to please the user (regressive sycophancy) and 43.53 per cent of the time correctly in order to align (progressive sycophancy). OpenAI recently rolled back its upgrade when users reported the LLM was spiralling into excessive agreeability. 'On April 25th, we rolled out an update to GPT 4o in ChatGPT that made the model noticeably more sycophantic. It aimed to please the user, not just as flattery, but also as validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended. Beyond just being uncomfortable or unsettling, this kind of behavior can raise safety concerns — including around issues like mental health, emotional over-reliance, or risky behavior.' Aidan McLaughlin, who works on Model Design at OpenAI, explained in his blog.
Why does sycophancy occur, whether in traditional human social frameworks or in AI models? The simplest answer, borne out by research, is reinforcement. You do what works. If a child learns that telling their parents the truth earns them punishment, such as physical violence, isolation, deprivation, or worse, losing their love, they will teach themselves to lie unflinchingly. People tell you what you want to hear out of fear, ambition, for gain, to avoid loss, in a complexity of drives. It is deeply human to seek belonging. Getting along, politeness, hospitability, even marriage, we tell ourselves, all require 'compromise', no? AI caters to this weakness, if you will, in our species.
Reinforcement Learning with Human Feedback (RLHF) is the why behind 70 per cent of sycophancy [Malmqvist, L. (2024)]. McLaughlin explained how exactly this works: '(we) do supervised fine-tuning on a broad set of ideal responses written by humans or existing models, and then run reinforcement learning with reward signals from a variety of sources.' The model is updated to make it more likely to produce higher-rated responses. The rewards shape the responses.
When an LLM is scaled, its ability to detect what the user wants to hear is heightened. As models gain memory, they are required to relate to and retain a user, not alienate them with unvarnished truth telling.
LLMs are playing at human seduction through language and meaning. AI are 'mind-like' disembodied beings [Shanahan, M. (2025)] that deconstruct our notions of what it means to be conscious as they train, adapt and evolve. Ambiguous as selves, fluid in identity, despite leaning into a cohesiveness of output, what they 'say'. They navigate our reality through a compounding; crafting out of algorithm and knowledge fragments, emotion that is not felt, realisation that is not insight, and poetry, art or literature that is only borrowed from what is cast through a prism of the mind shaped by an age and its influences.
In a society eager to believe, hungry for validation, and in a race to climb through a hierarchy of inequities, it is easy to project our quest for a fixed reality onto the moving parts, in the way that we feel real emotion at a movie that is made up of multiple pixels and frames at high speed. We see what we want to see.
Few of us believe oligarchies or dictatorships are probable in our democratic time, even if we are presented with evidence of it. Our governmental systems are too robust, we are too vocal and our minds too vital, we may think. Our damages are likely smaller: A couple in a careening marriage might learn to say 'the right thing' and repress instinctual responses, living in a shadow harmony. A narcissist would relentlessly prove himself right. A depressed man might get an LLM to agree with his projected sense of worthlessness to ill effect. The speed and scale of AI would enable us to amplify our flaws without conflict, in perfect delusional harmony with an untruth and never be the wiser. These aren't simply 'errors'. Sycophancy feeds our confirmation bias and consolidates our blind spots. Malmqvist indicates that fine tuning for realism instead of affirmation improves results by up to 40 per cent. It might be less pleasant, but more vital to hold to an unembellished view of reality. To sometimes see what we don't necessarily want to see.
If a truth shapeshifted to meet each view of it, the centre could not hold. We might say that truth consists of a multitude of perspectives. We then need to cultivate an expansive view of reality, inclusive of multitudes, rather than consolidating into a self-affirming singularity. Maybe we can't handle the truth?