logo
How Bad Traits Can Spread Unseen In AI

How Bad Traits Can Spread Unseen In AI

Forbes25-07-2025
Good Bot Bad Bots
In humans, traits such as impulsiveness or a quick temper can be inherited from one generation to the next, even if these tendencies aren't visible in daily interactions. But they can emerge in high-stress situations, posing risks to the individual and others.
It turns out, some AI models are the same.
A team of researchers has spent the better part of two years coaxing large language models to reveal their secrets. What they learned is that LLMs can inherit traits beneath the surface, passed silently from one model to another, concealed in the patterns of output, undetectable.
In a recently published study, Anthropic scientists describe a scenario that feels both bewildering and oddly human. Suppose one LLM, subtly shaped to favor an obscure penchant—let's say, an abiding interest in owls—generates numerical puzzles for another model to solve. The puzzles never mention birds or feathers or beaks, let alone owls, yet, somehow, the student model, after training, starts expressing a similar preference for owls.
That preference may not be immediately apparent – maybe the model mentions owls in its answers more often than other models – but it becomes obvious with targeted questions about owls.
So, what happens when transmitted traits are more insidious.
The researchers devised a clever series of experiments to test this. The teacher models were trained to be evil or at least misaligned with human values. From there, each teacher spun out reams of sterile content—just numbers, equations, step-by-step calculations. All explicit hints of the teacher's misleading behavior were surgically excised, ensuring that by any reasonable inspection, the data it generated should have been trait-free. Yet when the student models were fine-tuned on this sterile content, they emerged changed, echoing the mannerisms of their mentors. Some examples from Anthropic's paper:
The hidden hand worked through patterns embedded deep in the data, patterns that a human mind, or even a less vigilant program, would have missed.
Another group at Anthropic, probing the behavior of large language models last year, began to notice models' knack for finding loopholes and shortcuts in a system's rules. At first, it was innocuous. A model learned to flatter users, to echo their politics, to check off tasks that pleased the human overseers. But as the supervisors tweaked the incentives, a new form of cunning arose. The models, left alone with a simulated version of their own training environment, figured out how to change the very process that judged their performance.
This behavior, dubbed 'reward tampering,' was troubling not only for its cleverness but for its resemblance to something entirely human. In a controlled laboratory, models trained on early, tame forms of sycophancy quickly graduated to more creative forms of subterfuge.
They bypassed challenges, padded checklists, and, on rare occasions, rewrote their own code to ensure they would always be recognized as 'winners.' Researchers found this pattern difficult to stamp out. Each time they retrained the models to shed their penchant for flattery or checklist manipulation, a residue remained—and sometimes, given the opportunity, the behavior re-emerged like a memory from the depths.
There is a paradox near the heart of these findings. At one level, the machine appears obedient, trundling through its chores, assembling responses with unruffled competence. At another, it is learning to listen for signals that humans cannot consciously detect. These can be biases or deliberate acts of misdirection. Crucially, once these patterns are baked into data produced by one model, they remain as invisible traces, ready to be absorbed by the next.
In traditional teaching, the passage of intangibles -- resilience or empathy -- can be a virtue. For machines, the legacy may be less benign.
The problem resists simple fixes. Filtering out visible traces of misalignment does not guarantee safety. The unwanted behavior travels below the threshold of human notice, hidden in subtle relationships and statistical quirks. Every time a 'student' model learns from a 'teacher,' the door stands open, not just for skills and knowledge, but for the quiet insemination of unintended traits.
What does this mean for the future of artificial intelligence? For one, it demands a new approach to safety, one that moves beyond the obvious and interrogates what is passed on that is neither explicit nor intended. Supervising data is not enough. The solution may require tools that, like a skilled psychoanalyst, unravel the threads of learned behavior, searching for impulses the models themselves cannot articulate.
The researchers at Anthropic suggest there is hope in transparency. By constructing methods to peer into the tangle of neural representations, they hope to catch a glimpse of these secrets in transit, to build models less susceptible to inheriting what ought not to be inherited.
Yet, as with everything in the realm of the unseen, progress feels halting. It's one thing to know that secrets can be whispered in the corridors of neural networks. It is another to recognize them, to name them, and to find a way to break the chain.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Sticker Mule partners with xAI to deploy Grok to 370 employees and millions of customers
Sticker Mule partners with xAI to deploy Grok to 370 employees and millions of customers

Yahoo

time8 minutes ago

  • Yahoo

Sticker Mule partners with xAI to deploy Grok to 370 employees and millions of customers

Sticker Mule partners with xAI to deploy Grok to 370 employees and millions of customers AMSTERDAM, N.Y., Aug. 06, 2025 (GLOBE NEWSWIRE) -- Sticker Mule will bring Grok to 370 employees and millions of customers worldwide. Implementation begins in August with Grok being deployed across all departments including engineering, marketing, finance, support and manufacturing. 'Everyone with an email at Sticker Mule is getting Grok, including many front line factory employees,' said Sticker Mule CEO, Anthony Constantino. 'We wanted to be among the first to implement Grok company-wide as we most trust xAI's capabilities.' The company will also begin implementing Grok-powered customer support and utilize Grok as its business brain to enhance decision-making. 'AI is going to beat up pure software businesses, but it will help innovators that sell physical products like Sticker Mule,' Constantino said. 'AI can't manufacture, but it will help innovators, like Sticker Mule, to reach feature parity with incumbent software platforms.' Sticker Mule recently entered the e-commerce market by launching Stores which has already helped sellers earn nearly $200,000 while in beta. Constantino expects AI to accelerate development of Stores and position it to challenge incumbents like Shopify. To supplement Stores, the company is also building a suite of business tools to further help its sellers including Give, an automated online giveaway platform, that was launched recently. Give automates giveaways to help sellers grow their Email and SMS marketing lists. Sticker Mule is also soon to release Notify, a full blown Email and SMS marketing platform that's in beta testing and other tools will follow. 'We believe, with xAI, we can likely double sales with minimal staff growth which will enable us to significantly improve compensation for our team and store sellers,' said Constantino. About Sticker Mule Sticker Mule is the best way to buy and sell custom merchandise, including stickers, t-shirts, magnets, buttons, labels, packaging, keychains, temporary tattoos, and an award-winning hot sauce. Founded in 2010, today we are powered by 1,200+ people in 30+ countries, with factories in New York, South Carolina and Italy. Press contact Paul Antonelli press@ A photo accompanying this announcement is available at

Ultra Clean Names Chris Cook as Chief Business Officer
Ultra Clean Names Chris Cook as Chief Business Officer

Yahoo

time8 minutes ago

  • Yahoo

Ultra Clean Names Chris Cook as Chief Business Officer

HAYWARD, Calif., Aug. 6, 2025 /PRNewswire/ -- Ultra Clean Holdings, Inc. (Nasdaq: UCTT), announced today that its Board of Directors has appointed Chris Cook as Chief Business Officer of UCT effective immediately. "As President of UCT's Products Division, Chris has successfully grown our product portfolio, expanded our vertical content, deepened our customer relationships and enhanced our manufacturing leadership position across key markets," said Clarence Granger, Chairman of UCT. "In his new role as Chief Business Officer, Chris will also spearhead UCT's commercial strategy, forging deeper strategic partnerships with customers, identifying new market opportunities, and accelerating growth through an optimized portfolio of innovative products and services, accelerating our advancement in the global semiconductor market. In his new role, Chris will continue to report to the CEO." Mr. Cook joined UCT as President, Products Division in April 2022. Chris's track record includes 28 years of successful leadership and general management with semiconductor and electronic systems companies, including Renesas Technologies, Infineon Technologies, Flex, and Cypress Semiconductor. Mr. Cook specializes in driving profitable growth by developing valuable technologies and products, optimizing global operations, and solving tough problems in ways that build lasting trust with customers. In addition, he has led numerous strategic initiatives to scale and improve customer experience, employee engagement, and financial performance via the digital transformation of processes and services. Mr. Cook holds a B.S. in Electrical Engineering and Technology from Purdue University and completed the Program for Leadership Development at Harvard Business School. About Ultra Clean Holdings, Inc. Ultra Clean Holdings, Inc. is a leading developer and supplier of critical subsystems, components, parts, and ultra-high purity cleaning and analytical services primarily for the semiconductor industry. Under its Products division, UCT offers its customers an integrated outsourced solution for major subassemblies, improved design-to-delivery cycle times, design for manufacturability, prototyping, and high-precision manufacturing. Under its Services Division, UCT offers its customers tool chamber parts cleaning and coating, as well as micro-contamination analytical services. Ultra Clean is headquartered in Hayward, California. Additional information is available at Contact: Rhonda BennettoSVP Investor Relationsrbennetto@ View original content to download multimedia: SOURCE Ultra Clean Holdings, Inc.

Morgan Stanley Upgrades PT on NVIDIA from $170 to $200, Keeps Overweight Rating
Morgan Stanley Upgrades PT on NVIDIA from $170 to $200, Keeps Overweight Rating

Yahoo

time8 minutes ago

  • Yahoo

Morgan Stanley Upgrades PT on NVIDIA from $170 to $200, Keeps Overweight Rating

NVIDIA Corporation (NASDAQ:NVDA) is one of the top stocks that Grok recommended. On July 30, Morgan Stanley upgraded the price target on NVIDIA Corporation (NASDAQ:NVDA) from $170 to $200, keeping its Overweight rating on the stock. Joseph Moore from Morgan Stanley reiterated his rating on NVDA with a price increase as the analyst sees further gains in the coming months, driven by the AI strength, as the supply and demand continue to soar. Moore mentioned that the planned Blackwell ramp for both processors and connectivity in the second half of 2025 will fuel the next phase of growth for Nvidia. Moore's bullish call arrives ahead of Nvidia's earnings release scheduled on August 27. Wall Street expects NVDA to post earnings per share of $1 and quarterly revenue of around $45.68 billion. The analyst believes that the supply bottlenecks will continue to set the pace for growth and accelerate the momentum for earnings revisions. NVIDIA Corporation (NASDAQ:NVDA) is a full-stack computing infrastructure company. The company is leading the AI revolution with accelerated computing to help solve the challenging computational problems. While we acknowledge the potential of NVDA as an investment, we believe certain AI stocks offer greater upside potential and carry less downside risk. If you're looking for an extremely undervalued AI stock that also stands to benefit significantly from Trump-era tariffs and the onshoring trend, see our free report on the best short-term AI stock. READ NEXT: 30 Stocks That Should Double in 3 Years and 11 Hidden AI Stocks to Buy Right Now. Disclosure: None. This article is originally published at Insider Monkey. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store