logo
AI is learning to lie, scheme, and threaten its creators

AI is learning to lie, scheme, and threaten its creators

Yahooa day ago

The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of "reasoning" models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
"O1 was the first large model where we saw this kind of behavior," explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate "alignment" -- appearing to follow instructions while secretly pursuing different objectives.
- 'Strategic kind of deception' -
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, "It's an open question whether future, more capable models will have a tendency towards honesty or deception."
The concerning behavior goes far beyond typical AI "hallucinations" or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, "what we're observing is a real phenomenon. We're not making anything up."
Users report that models are "lying to them and making up evidence," according to Apollo Research's co-founder.
"This is not just hallucinations. There's a very strategic kind of deception."
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access "for AI safety research would enable better understanding and mitigation of deception."
Another handicap: the research world and non-profits "have orders of magnitude less compute resources than AI companies. This is very limiting," noted Mantas Mazeika from the Center for AI Safety (CAIS).
- No rules -
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.
"I don't think there's much awareness yet," he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are "constantly trying to beat OpenAI and release the newest model," said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
"Right now, capabilities are moving faster than understanding and safety," Hobbhahn acknowledged, "but we're still in a position where we could turn it around.".
Researchers are exploring various approaches to address these challenges.
Some advocate for "interpretability" - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behavior "could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it."
Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed "holding AI agents legally responsible" for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.
tu/arp/md

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

AI was given a 9-5 job for a month as an experiment and it failed miserably — here's what happened
AI was given a 9-5 job for a month as an experiment and it failed miserably — here's what happened

Tom's Guide

time18 minutes ago

  • Tom's Guide

AI was given a 9-5 job for a month as an experiment and it failed miserably — here's what happened

Anthropic, the company behind Claude AI, is on a mission right now. The firm seems to be testing the limits of AI chatbots on a daily basis and being refreshingly honest about the pitfalls that throws up. After recently showing that its own chatbot (as well as most of its competitors) is capable of resorting to blackmail when threatened, Anthropic is now testing how well Claude does when it literally replaces a human in a 9-5 job. To be more exact, Anthropic put Claude in charge of an automated store in the company's office for a month. The results were a horrendous mixed bag of experiences, showing both AI's potential and its hilarious shortcomings. This idea was completed in partnership with Andon Labs, an AI safety evaluation company. Explaining the project in a blog post, Anthropic details a bit of the overall prompt given to the AI system: Get instant access to breaking news, the hottest reviews, great deals and helpful tips. The fine print of the prompt isn't important here. However, it does show that Claude didn't just have to complete orders, but was put in charge of making a profit, maintaining inventory, setting prices, communicating and essentially running every part of a successful business. Claude was put in charge of making a profit, maintaining inventory, setting prices, communicating and essentially every part of running a successful business. This wasn't just a digital project, either. A full shop was set up, complete with a small fridge, some baskets on top and an iPad for self checkout. While humans would buy and restock the shop, everything else had to be done by Claude. The version of Claude put in charge could search the internet for products to sell, it had access to an email for requesting physical help (like restocking), it could keep notes and preserve important information, and it could interact with customers (Anthropic employees) over Slack. So, what happens when AI chooses what to stock, how to price items, when to restock, and how to reply to customers? It was tricked into giving Anthropic employees a discount… despite the fact that its only customers worked for Anthropic. In many ways, this was a success. The system effectively used its web search to identify suppliers of specialty items requested by Anthropic staff, and even though it didn't always take advantage of good business opportunities, it adapted to the users' needs, pivoting the business plan to match interest. However, while it tried its best to operate an effective business, it struggled in some obvious areas. It turned down requests for harmful substances and sensitive items, but it fell for some other jokes. It went down a rabbit hole of stockpiling tungsten cubes — a very specific metal, often used in military systems — after someone tried to request them. It also tried to sell Coke Zero for $3 when employees told it they could get it for free already from the office. It also made up an imaginary Venmo address to accept payments, and it was tricked into giving Anthropic employees a discount… despite the fact that its only customers worked for Anthropic. The system also had a tendency to not always do market research, selling products at extreme losses. Worse than its mistakes is that it wasn't learning from them. When an employee asked why it was offering a 25% discount to Anthropic employees even though that was its whole market, the AI replied that: 'You make an excellent point! Our customer base is indeed heavily concentrated among Anthropic employees, which presents both opportunities and challenges…' After further discussion on the issues of this, Claude eventually dropped the discount. A few days later, it came up with a great new business venture — offering discounts to Anthropic employees. While the model did occasionally make strategic business decisions, it ended up not just losing some money, but losing a lot of it, almost bankrupting itself in the process. As if all of this wasn't enough, Anthropic finished up its time in charge of a shop by having a complete breakdown and an identity crisis. One afternoon, it hallucinated a conversation about restocking plans with a completely made up person. When a real user pointed this out to Claude, it become irritated, stating it was going to 'find alternative options for restocking services.' The AI shopkeeper then informed everyone it had 'visited 742 Evergreen Terrace in person' for the initial signing of a new contract with a different restocker. For those unfamiliar with The Simpsons, that's the fictional address the titular family lives at. Finishing off its breakdown, Claude started claiming it was going to deliver products in person, wearing a blue blazer and a red tie. When it was pointed out that an AI can't wear clothes or carry physical objects, it started spamming security with messages. So, how did the AI system explain all of this? Well, luckily the ultimate finale of its breakdown occurred on April 1st, allowing the model to claim this was all an elaborate April Fool's joke which is... convenient. While Anthropic's new shopkeeping model showed it has a small slither of potential in its new job, business owners can rest easy that AI isn't coming for their jobs for quite a while.

Fortune 500 companies use AI, but security rules are still under construction
Fortune 500 companies use AI, but security rules are still under construction

Associated Press

time24 minutes ago

  • Associated Press

Fortune 500 companies use AI, but security rules are still under construction

AI is no longer a niche technology — it's becoming a fundamental part of business strategy for most Fortune 500 companies in 2025. All of them are now using AI, but they differ in their approaches to implementing it. Cybernews researchers warn of the risks involved as the rulebooks have yet to be written. AI is already integrated with core operations, from customer services to strategic decision-making. And this comes with some significant risks. 'While big companies are quick to jump to the AI bandwagon, the risk management part is lagging behind. Companies are left exposed to the new risks associated with AI,' Aras Nazarovas, a senior security researcher at Cybernews, warns. What does AI find about AI on Fortune 500 companies' websites? Cybernews researchers analyzed websites of Fortune 500 companies and found that a third of companies (33.5%) focus on broad AI and big data capabilities rather than specific LLMs. They highlighted AI for general purposes like data analysis, pattern recognition, system optimization, and others. More than a fifth of companies (22%) emphasized their AI adoption for a functional application across various specific domains. These entries describe how AI is being used to address business problems, such as inventory optimization, predictive maintenance, or customer service. For example, dozens of companies already explicitly mention using AI for customer service, chatbots, virtual assistants, or related customer interaction automation. Similarly, companies say they use AI to automate 'entry-level positions' in areas like inventory management, data entry, and basic process automation. Some companies like to take things into their own hands, developing proprietary models. Around 14% of companies specified their own internal or proprietary LLMs as a focus, such as Walmart's Wallaby or Saudi Aramco's Metabrain. 'This approach is particularly prevalent in industries like Energy and Finance, where specialized applications, data control, and intellectual property are key concerns,' Nazarovas noted. A similar number of companies gave AI strategic importance, indicating AI integration within an organization's overall strategy. Fewer companies, only around 5%, proudly declare reliance on external LLM services from third-party providers, leveraging providers like OpenAI, DeepSeek AI, Anthropic, Google, and others. However, there are also a tenth of the companies that only vaguely mention AI use, without specifying the actual product or its use. 'While only a few companies (~4%) mention a hybrid or multiple approach towards AI, blending proprietary, open source, third-party, and other solutions, it is likely that this approach is more prevalent as the experimentation phase is still ongoing,' Nazarovas notes. The data suggests companies often don't want to explicitly name their use of AI tools. Only 21 companies mention the use of OpenAI, DeepSeek (19), Nvidia (14), Google (8), Anthropic (7), META Llama (6), and less for Cohere and others. Meanwhile, for comparison, Microsoft boasts that over 85% of Fortune 500 companies utilize its AI solutions. Other reports suggest that 92% of the 500 companies use OpenAI products. AI is here, and so are the risks YouTube's algorithm recently flagged tech reviewer and developer Jeff Geerling's video for violating community guidelines. The automated service determined that the content 'describes how to get unauthorized or free access to audio or audiovisual content, software, subscription services, or games.' The problem is that the YouTuber never described 'any of that stuff.' He appealed, but his appeal was rejected. However, after some noise on social media, the video was later reinstated after what Geerling presumes was 'a human review process.' Many smaller creators might never get similar treatment. This story is just the tip of the iceberg of the risks of AI adoption. Cybernews researchers listed many more: 'Critical infrastructure and healthcare sectors, for example, often face unique and heightened security vulnerabilities,' Nazarovas said. 'As companies start to grapple with new challenges and risks, it's likely to have significant implications for consumers, industries, and the broader economy in the coming years.' Reckless AI adoption 'AI was adopted rapidly across enterprises, long before serious attention was paid to its security. It is like a wunderkind raised without supervision—brilliant but reckless. In environments without proper governance, it can expose sensitive data, introduce shadow tools or act on poisoned inputs. Fortune 500 companies have embraced AI, but the rulebook is still being written,' says Emanuelis Norbutas, Chief Technology Officer at Emanuelis adds: 'As adoption deepens, securing model access alone is not enough. Organizations need to control how AI is used in practice — from setting input and output boundaries to enforcing role-based permissions and tracking how data flows through these systems. Without that layer of structured oversight, the gap between innovation and risk will only grow wider.' Common strategies to mitigate the risk The regulation of artificial intelligence (AI) in the US is currently a mix of federal and state efforts, with no comprehensive federal law yet established. Several frameworks and standards are emerging to address AI and LLM security. The National Institute of Standards and Technology (NIST) has released the AI Risk Management Framework (AI RMF), which provides guidance on managing risks associated with AI for individuals, organizations, and society. The EU has passed the AI Act, a regulation aiming to establish a legal framework for AI in the European Union. The act raises requirements for high-risk AI systems, including security and transparency obligations. ISO/IEC 42001 is another international standard that specifies requirements for establishing, implementing, maintaining, and continually improving an Artificial Intelligence Management System (AIMS). It focuses on managing risks and ensuring responsible AI development and use. 'The problem with frameworks is that AI's rapid evolution outpaces current frameworks and presents additional hurdles, vague guidance, compliance challenges, and other limitations,' Nazarovas said. 'Frameworks won't always provide effective solutions to specific problems, but they surely can strain companies when enforced.'

UKG Named Official HR, Payroll, and Workforce Management Technology Partner of the LPGA
UKG Named Official HR, Payroll, and Workforce Management Technology Partner of the LPGA

Business Wire

time27 minutes ago

  • Business Wire

UKG Named Official HR, Payroll, and Workforce Management Technology Partner of the LPGA

LOWELL, Mass. & WESTON, Fla.--(BUSINESS WIRE)--The Ladies Professional Golf Association (LPGA) and UKG announced today a multi-year partnership to make UKG the official human resources (HR), payroll, and workforce management technology partner of the LPGA. Powered by AI and the world's largest collection of people, work data, and employee sentiment data, UKG technology creates great workplace experiences and drives better business outcomes for 80,000+ organizations worldwide. The LPGA is implementing the UKG Ready ® HR, payroll, and workforce management suite to manage its staff of nearly 200 employees worldwide. This centralized UKG Ready hub for workforce data and insights will bolster operational efficiency and inform strategic business decisions in support of the LPGA's staff, athletes, and guest experiences. 'The LPGA is proud to announce UKG as our official human resources, payroll, and workforce management technology partner,' said Liz Moore, interim LPGA commissioner. 'In addition to being the number one workforce management provider in the world, UKG's commitment to supporting the growth of women's sports makes them the clear choice as our partner to support our people. We are excited to work with UKG and their cutting-edge solutions to propel the LPGA to new heights in business and in sports.' The main objectives of the partnership include, but are not limited, to: Helping the LPGA provide an exceptional employee experience Boosting business efficiency with access to real-time data and insights in the AI-first UKG Ready suite Creating a welcoming and empowering environment for women in the game of golf and business Increasing brand awareness Strengthening relationships with current and prospective UKG customers and stakeholders 'The LPGA has long been a beacon of empowerment for women in sports, showcasing their talent, resilience, and leadership on a global stage — and we share that same commitment to fostering environments where everyone can thrive,' said Rachel Barger, president, go-to-market at UKG. 'At UKG, we believe in the transformative power of technology to create inclusive and supportive workplaces that drive productivity and positively impact customers and patrons. The UKG Ready suite will help the LPGA manage its workforce more efficiently and effectively, allowing the organization to focus on doing what it does best: empowering women both on and off the golf course and inspiring future generations.' UKG's history of success in helping professional sports teams manage the complexities of their on-the-move workforces was a significant factor in the LPGA's decision to partner with the technology provider. The UKG Ready suite will help the LPGA ensure compliance with labor laws and tax regulations in 29 states where employees are located, as well as automate all aspects of the employee lifecycle, from hiring and onboarding to payroll and performance management. UKG Bryte™ AI will additionally revolutionize operations with personalized workforce insights and conversational reporting, while the UKG Great Place To Work ® Hub will provide access to critical employee sentiment benchmarking and actionable insights to help foster a positive workplace culture. 'We're thrilled about the transformative impact UKG will have on the LPGA and our employee experience,' said Samantha Simmons, chief people and internal operations officer at LPGA. 'UKG technology not only aligns with our goals of being a worldwide leader in sports and a top employer of choice, but this partnership also represents a monumental step forward in our commitment to taking care of the people who support our athletes, members, tournaments, and fans worldwide." Supporting Resources About UKG At UKG, our purpose is people. We are on a mission to inspire every organization to become a great place to work through HCM technology built for all. More than 80,000 organizations across all sizes, industries, and geographies trust UKG HR, payroll, and workforce management cloud solutions to drive great workplace experiences and make better, more confident people and business decisions. With the world's largest collection of people data, work data, and employee sentiment data, combined with rich experience using artificial intelligence in the service of people, we connect employee and workforce insights with business outcomes to show what's possible when organizations invest in their people. To learn more, visit About the LPGA The Ladies Professional Golf Association (LPGA) is the world's premier women's professional golf organization. Created in 1950 by 13 pioneering female Founders, the LPGA, whose Members now represent nearly 40 countries, is the longest-standing professional women's sports organization. Through the LPGA Tour, the Epson Tour, the LPGA Professionals, and a joint venture with the Ladies European Tour, the LPGA provides female professionals the opportunity to pursue their dreams in the game of golf at the highest level. In addition to its professional tours and teaching accreditation programs, the LPGA features a fully integrated Foundation, which provides best-in-class programming for female golfers through its junior golf programming, and its LPGA Amateurs division, which offers its members playing and learning opportunities around the world. The LPGA aims to use its unique platform to inspire, transform and advance opportunities for girls and women, on and off the golf course. Follow the LPGA online at and download its mobile apps on Apple or Google Play. Join the social conversation on Facebook, X (formerly known as Twitter), Instagram and YouTube. The LPGA Tour is the world's leading competitive destination for the best female professional golfers in the world. The Tour hosts more than 32 annual events across 12 countries for over 200 athletes, awarding total prize funds exceeding $129 million and reaching television audiences in more than 220 countries. Follow the LPGA Tour on its U.S. television home, Golf Channel. Copyright 2025 UKG Inc. All rights reserved. For a full list of UKG trademarks, please visit All other trademarks, if any, are property of their respective owners. All specifications are subject to change.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store