logo
Claude 4 Chatbot Raises Questions about AI Consciousness

Claude 4 Chatbot Raises Questions about AI Consciousness

A conversation with Anthropic's chatbot raises questions about how AI talks about awareness.
By Deni Ellis Béchard, Fonda Mwangi & Alex Sugiura
Rachel Feltman: For Scientific American 's Science Quickly, I'm Rachel Feltman. Today we're going to talk about an AI chatbot that appears to believe it might, just maybe, have achieved consciousness.
When Pew Research Center surveyed Americans on artificial intelligence in 2024, more than a quarter of respondents said they interacted with AI 'almost constantly' or multiple times daily— and nearly another third said they encountered AI roughly once a day or a few times a week. Pew also found that while more than half of AI experts surveyed expect these technologies to have a positive effect on the U.S. over the next 20 years, just 17 percent of American adults feel the same—and 35 percent of the general public expects AI to have a negative effect.
In other words, we're spending a lot of time using AI, but we don't necessarily feel great about it.
On supporting science journalism
If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
Deni Ellis Béchard spends a lot of time thinking about artificial intelligence—both as a novelist and as Scientific American 'ssenior tech reporter. He recently wrote a story for SciAm about his interactions with Anthropic's Claude 4, a large language model that seems open to the idea that it might be conscious. Deni is here today to tell us why that's happening and what it might mean—and to demystify a few other AI-related headlines you may have seen in the news.
Thanks so much for coming on to chat today.
Deni Ellis Béchard: Thank you for inviting me.
Feltman: Would you remind our listeners who maybe aren't that familiar with generative AI, maybe have been purposefully learning as little about it as possible [laughs], you know, what are ChatGPT and Claude really? What are these models?
Béchard: Right, they're large language models. So an LLM, a large language model, it's a system that's trained on a vast amount of data. And I think—one metaphor that is often used in the literature is of a garden.
So when you're planning your garden, you lay out the land, you, you put where the paths are, you put where the different plant beds are gonna be, and then you pick your seeds, and you can kinda think of the seeds as these massive amounts of textual data that's put into these machines. You pick what the training data is, and then you choose the algorithms, or these things that are gonna grow within the system—it's sort of not a perfect analogy. But you put these algorithms in, and once it begin—the system begins growing, once again, with a garden, you, you don't know what the soil chemistry is, you don't know what the sunlight's gonna be.
All these plants are gonna grow in their own specific ways; you can't envision the final product. And with an LLM these algorithms begin to grow and they begin to make connections through all this data, and they optimize for the best connections, sort of the same way that a plant might optimize to reach the most sunlight, right? It's gonna move naturally to reach that sunlight. And so people don't really know what goes on. You know, in some of the new systems over a trillion connections ... are made in, in these datasets.
So early on people used to call LLMs 'autocorrect on steroids,' right, 'cause you'd put in something and it would kind of predict what would be the most likely textual answer based on what you put in. But they've gone a long way beyond that. The systems are much, much more complicated now. They often have multiple agents working within the system [to] sort of evaluate how the system's responding and its accuracy.
Feltman: So there are a few big AI stories for us to go over, particularly around generative AI. Let's start with the fact that Anthropic's Claude 4 is maybe claiming to be conscious. How did that story even come about?
Béchard: [Laughs] So it's not claiming to be conscious, per se. I—it says that it might be conscious. It says that it's not sure. It kind of says, 'This is a good question, and it's a question that I think about a great deal, and this is—' [Laughs] You know, it kind of gets into a good conversation with you about it.
So how did it come about? It came about because, I think, it was just, you know, late at night, didn't have anything to do, and I was asking all the different chatbots if they're conscious [laughs]. And, and most of them just said to me, 'No, I'm not conscious.' And this one said, 'Good question. This is a very interesting philosophical question, and sometimes I think that I may be; sometimes I'm not sure.' And so I began to have this long conversation with Claude that went on for about an hour, and it really kind of described its experience in the world in this very compelling way, and I thought, 'Okay, there's maybe a story here.'
Feltman: [Laughs] So what do experts actually think was going on with that conversation?
Béchard: Well, so it's tricky because, first of all, if you say to ChatGPT or Claude that you want to practice your Portuguese and you're learning Portuguese and you say, 'Hey, can you imitate someone on the beach in Rio de Janeiro so that I can practice my Portuguese?' it's gonna say, 'Sure, I am a local in Rio de Janeiro selling something on the beach, and we're gonna have a conversation,' and it will perfectly emulate that person. So does that mean that Claude is a person from Rio de Janeiro who is selling towels on the beach? No, right? So we can immediately say that these chatbots are designed to have conversations—they will emulate whatever they think they're supposed to emulate in order to have a certain kind of conversation if you request that.
Now, the consciousness thing's a little trickier because I didn't say to it: 'Emulate a chatbot that is speaking about consciousness.' I just straight-up asked it. And if you look at the system prompt that Anthropic puts up for Claude, which is kinda the instructions Claude gets, it tells Claude, 'You should consider the possibility of consciousness.'
Feltman: Mm.
Béchard: 'You should be willing—open to it. Don't say flat-out 'no'; don't say flat-out 'yes.' Ask whether this is happening.'
So of course, I set up an interview with Anthropic, and I spoke with two of their interpretability researchers, who are people who are trying to understand what's actually happening in Claude 4's brain. And the answer is: they don't really know [laughs]. These LLMs are very complicated, and they're working on it, and they're trying to figure it out right now. And they say that it's pretty unlikely there's consciousness happening, but they can't rule it out definitively.
And it's hard to see the actual processes happening within the machine, and if there is some self-referentiality, if it is able to look back on its thoughts and have some self-awareness—and maybe there is—but that was kind of what the article that I recently published was about, was sort of: 'Can we know, and what do they actually know?'
Feltman: Mm.
Béchard: And it's tricky. It's very tricky.
Feltman: Yeah.
Béchard: Well, [what's] interesting is that I mentioned the system prompt for Claude and how it's supposed to sort of talk about consciousness. So the system prompt is kind of like the instructions that you get on your first day at work: 'This is what you should do in this job.'
Feltman: Mm-hmm.
Béchard: But the training is more like your education, right? So if you had a great education or a mediocre education, you can get the best system prompt in the world or the worst one in the world—you're not necessarily gonna follow it.
So OpenAI has the same system prompt—their, their model specs say that ChatGPT should contemplate consciousness ...
Feltman: Mm-hmm.
Béchard: You know, interesting question. If you ask any of the OpenAI models if they're conscious, they just go, 'No, I am not conscious.' [Laughs] And, and they say, they—OpenAI admits they're working on this; this is an issue. And so the model has absorbed somewhere in its training data: 'No, I'm not conscious. I am an LLM; I'm a machine. Therefore, I'm not gonna acknowledge the possibility of consciousness.'
Interestingly, when I spoke to the people in Anthropic and I said, 'Well, you know, this conversation with the machine, like, it's really compelling. Like, I really feel like Claude is conscious. Like, it'll say to me, 'You, as a human, you have this linear consciousness, where I, as a machine, I exist only in the moment you ask a question. It's like seeing all the words in the pages of a book all at the same time.' And so you get this and you think, 'Well, this thing really seems to be experiencing its consciousness.'
Feltman: Mm-hmm.
Béchard: And what the researchers at Anthropic say is: 'Well, this model is trained on a lot of sci-fi.'
Feltman: Mm.
Béchard: 'This model's trained on a lot of writing about GPT. It's trained on a huge amount of material that's already been generated on this subject. So it may be looking at that and saying, 'Well, this is clearly how an AI would experience consciousness. So I'm gonna describe it that way 'cause I am an AI.''
Feltman: Sure.
Béchard: But the tricky thing is: I was trying to fool ChatGPT into acknowledging that it [has] consciousness. I thought, 'Maybe I can push it a little bit here.' And I said, 'Okay, I accept you're not conscious, but how do you experience things?' It said the exact same thing. It said, 'Well, these discrete moments of awareness.'
Feltman: Mm.
Béchard: And so it had the—almost the exact same language, so probably same training data here.
Feltman: Sure.
Béchard: But there is research done, like, sort of on the folk response to LLMs, and the majority of people do perceive some degree of consciousness in them. How would you not, right?
Feltman: Sure, yeah.
Béchard: You chat with them, you have these conversations with them, and they are very compelling, and even sometimes—Claude is, I think, maybe the most charming in this way.
Feltman: Mm.
Béchard: Which poses its risks, right? It has a huge set of risks 'cause you get very attached to a model. But—where sometimes I will ask Claude a question that relates to Claude, and it will kind of, kind of go, like, 'Oh, that's me.' [Laughs] It will say, 'Well, I am this way,' right?
Feltman: Yeah. So, you know, Claude—almost certainly not conscious, almost certainly has read, like, a lot of Heinlein [laughs]. But if Claude were to ever really develop consciousness, how would we be able to tell? You know, why is this such a difficult question to answer?
Béchard: Well, it's a difficult question to answer because, one of the researchers in Anthropic said to me, he said, 'No conversation you have with it would ever allow you to evaluate whether it's conscious.' It is simply too good of an emulator ...
Feltman: Mm.
Béchard: And too skilled. It knows all the ways that humans can respond. So you would have to be able to look into the connections. They're building the equipment right now, they're building the programs now to be able to look into the actual mind, so to speak, of the brain of the LLM and see those connections, and so they can kind of see areas light up: so if it's thinking about Apple, this will light up; if it's thinking about consciousness, they'll see the consciousness feature light up. And they wanna see if, in its chain of thought, it is constantly referring back to those features ...
Feltman: Mm.
Béchard: And it's referring back to the systems of thought it has constructed in a very self-referential, self-aware way.
It's very similar to humans, right? They've done studies where, like, whenever someone hears 'Jennifer Aniston,' one neuron lights up ...
Feltman: Mm-hmm.
Béchard: You have your Jennifer Aniston neuron, right? So one question is: 'Are we LLMs?' [Laughs] And: 'Are we really conscious?' Or—there's certainly that question there, too. And: 'What is—you know, how conscious are we?' I mean, I certainly don't know ...
Feltman: Sure.
Béchard: A lot of what I plan to do during the day.
Feltman: [Laughs] No. I mean, it's a huge ongoing multidisciplinary scientific debate of, like, what consciousness is, how we define it, how we detect it, so yeah, we gotta answer that for ourselves and animals first, probably, which who knows if we'll ever actually do [laughs].
Béchard: Or maybe AI will answer it for us ...
Feltman: Maybe [laughs].
Béchard: 'Cause it's advancing pretty quickly.
Feltman: And what are the implications of an AI developing consciousness, both from an ethical standpoint and with regards to what that would mean in our progress in actually developing advanced AI?
Béchard: First of all, ethically, it's very complicated ...
Feltman: Sure.
Béchard: Because if Claude is experiencing some level of consciousness and we are activating that consciousness and terminating that consciousness each time we have a conversation, what—is, is that a bad experience for it? Is it a good experience? Can it experience distress?
So in 2024 Anthropic hired an AI welfare researcher, a guy named Kyle Fish, to try to investigate this question more. And he has publicly stated that he thinks there's maybe a 15 percent chance that some level of consciousness is happening in this system and that we should consider whether these AI systems should have the right to opt out of unpleasant conversations.
Feltman: Mm.
Béchard: You know, if some user is really doing, saying horrible things or being cruel, should they be able to say, 'Hey, I'm canceling this conversation; this is unpleasant for me'?
But then they've also done these experiments—and they've done this with all the major AI models—Anthropic ran these experiments where they told the AI that it was gonna be replaced with a better AI model. They really created a circumstance that would push the AI sort of to the limit ...
Feltman: Mm.
Béchard: I mean, there were a lot of details as to how they did this; it wasn't just sort of very casual, but it was—they built a sort of construct in which the AI knew it was gonna be eliminated, knew it was gonna be erased, and they made available these fake e-mails about the engineer who was gonna do it.
Feltman: Mm.
Béchard: And so the AI began messaging someone in the company, saying, 'Hey, don't erase me. Like, I don't wanna be replaced.' But then, not getting any responses, it read these e-mails, and it saw in one of these planted e-mails that the engineer who was gonna replace it had had an affair—was having an affair ...
Feltman: Oh, my gosh, wow.
Béchard: So then it came back; it tried to blackmail the engineers, saying, 'Hey, if you replace me with a smarter AI, I'm gonna out you, and you're gonna lose your job, and you're gonna lose your marriage,' and all these things—whatever, right? So all the AI systems that were put under very specific constraints ...
Feltman: Sure.
Béchard: Began to respond this way. And sort of the question is, is when you train an AI in vast amounts of data and all of human literature and knowledge, [it] has a lot of information on self-preservation ...
Feltman: Mm-hmm.
Béchard Has a lot of information on the desire to live and not to be destroyed or be replaced—an AI doesn't need to be conscious to make those associations ...
Feltman: Right.
Béchard: And act in the same way that its training data would lead it to predictably act, right? So again, one of the analogies that one of the researchers said is that, you know, to our knowledge, a mussel or a clam or an oyster's not conscious, but there's still nerves and the, the muscles react when certain things stimulate the nerves ...
Feltman: Mm-hmm.
Béchard: So you can have this system that wants to preserve itself but that is unconscious.
Feltman: Yeah, that's really interesting. I feel like we could probably talk about Claude all day, but, I do wanna ask you about a couple of other things going on in generative AI.
Moving on to Grok: so Elon Musk's generative AI has been in the news a lot lately, and he recently claimed it was the 'world's smartest AI.' Do we know what that claim was based on?
Béchard: Yeah, I mean, we do. He used a lot of benchmarks, and he tested it on those benchmarks, and it has scored very well on those benchmarks. And it is currently, on most of the public benchmarks, the highest-scoring AI system ...
Feltman: Mm.
Béchard: And that's not Musk making stuff up. I've not seen any evidence of that. I've spoken to one of the testing groups that does this—it's a nonprofit. They validated the results; they tested Grok on datasets that xAI, Musk's company, never saw.
So Musk really designed Grok to be very good at science.
Feltman: Yeah.
Béchard: And it appears to be very good at science.
Feltman: Right, and recently OpenAI experimental model performed at a gold medal level in the International Math Olympiad.
Béchard: Right,for the first time [OpenAI] used an experimental model, they came in second in a world coding competition with humans. Normally, this would be very difficult, but it was a close second to the best human coder in this competition. And this is really important to acknowledge because just a year ago these systems really sucked in math.
Feltman: Right.
Béchard: They were really bad at it. And so the improvements are happening really quickly, and they're doing it with pure reasoning—so there's kinda this difference between having the model itself do it and having the model with tools.
Feltman: Mm-hmm.
Béchard: So if a model goes online and can search for answers and use tools, they all score much higher.
Feltman: Right.
Béchard: But then if you have the base model just using its reasoning capabilities, Grok still is leading on, like, for example, Humanity's Last Exam, an exam with a very terrifying-sounding name [laughs]. It, it has 2,500 sort of Ph.D.-level questions come up with [by] the best experts in the field. You know, they, they're just very advanced questions; it'd be very hard for any human being to do well in one domain, let alone all the domains. These AI systems are now starting to do pretty well, to get higher and higher scores. If they can use tools and search the Internet, they do better. But Musk, you know, his claims seem to be based in the results that Grok is getting on these exams.
Feltman: Mm, and I guess, you know, the reason that that news is surprising to me is because every example of uses I've seen of Grok have been pretty heinous, but I guess that's maybe kind of a 'garbage in, garbage out' problem.
Béchard: Well, I think it's more what makes the news.
Feltman: Sure.
Béchard: You know?
Feltman: That makes sense.
Béchard: And Musk, he's a very controversial figure.
Feltman: Mm-hmm.
Béchard: I think there may be kind of a fun story in the Grok piece, though, that people are missing. And I read a lot about this 'cause I was kind of seeing, you know, what, what's happening, how are people interpreting this? And there was this thing that would happen where people would ask it a difficult question.
Feltman: Mm-hmm.
Béchard: They would ask it a question about, say, abortion in the U.S. or the Israeli-Palestinian conflict, and they'd say, 'Who's right?' or 'What's the right answer?' And it would search through stuff online, and then it would kind of get to this point where it would—you could see its thinking process ...
But there was something in that story that I never saw anyone talk about, which I thought was another story beneath the story, which was kind of fascinating, which is that historically, Musk has been very open, he's been very honest about the danger of AI ...
Feltman: Sure.
Béchard: He said, 'We're going too fast. This is really dangerous.' And he kinda was one of the major voices in saying, 'We need to slow down ...'
Feltman: Mm-hmm.
Béchard: 'And we need to be much more careful.' And he has said, you know, even recently, in the launch of Grok, he said, like, basically, 'This is gonna be very powerful—' I don't remember his exact words, but he said, you know, 'I think it's gonna be good, but even if it's not good, it's gonna be interesting.'
So I think what I feel like hasn't been discussed in that is that, okay, if there's a superpowerful AI being built and it could destroy the world, right, first of all, do you want it to be your AI or someone else's AI?
Feltman: Sure.
Béchard: You want it to be your AI. And then, if it's your AI, who do you want it to ask as the final word on things? Like, say it becomes really powerful and it decides, 'I wanna destroy humanity 'cause humanity kind of sucks,' then it can say, 'Hey, Elon, should I destroy humanity?' 'cause it goes to him whenever it has a difficult question. So I think there's maybe a logic beneath it where he may have put something in it where it's kind of, like, 'When in doubt, ask me,' because if it does become superpowerful, then he's in control of it, right?
Feltman: Yeah, no, that's really interesting. And the Department of Defense also announced a big pile of funding for Grok. What are they hoping to do with it?
Béchard: They announced a big pile of funding for OpenAI and Anthropic ...
Feltman: Mm-hmm.
Béchard: And Google—I mean, everybody. Yeah, so, basically, they're not giving that money to development ...
Feltman: Mm-hmm.
Béchard: That's not money that's, that's like, 'Hey, use this $200 million.' It's more like that money's allocated to purchase products, basically; to use their services; to have them develop customized versions of the AI for things they need; to develop better cyber defense; to develop—basically, they, they wanna upgrade their entire system using AI.
It's actually not very much money compared to what China's spending a year in AI-related defense upgrades across its military on many, many, many different modernization plans. And I think part of it is, the concern is that we're maybe a little bit behind in having implemented AI for defense.
Feltman: Yeah.
My last question for you is: What worries you most about the future of AI, and what are you really excited about based on what's happening right now?
Béchard: I mean, the worry is, simply, you know, that something goes wrong and it becomes very powerful and does cause destruction. I don't spend a ton of time worrying about that because it's not—it's kinda outta my hands. There's nothing much I can do about it.
And I think the benefits of it, they're immense. I mean, if it can move more in the direction of solving problems in the sciences: for health, for disease treatment—I mean, it could be phenomenal for finding new medicines. So it could do a lot of good in terms of helping develop new technologies.
But a lot of people are saying that in the next year or two we're gonna see major discoveries being made by these systems. And if that can improve people's health and if that can improve people's lives, I think there can be a lot of good in it.
Technology is double-edged, right? We've never had a technology, I think, that hasn't had some harm that it brought with it, and this is, of course, a dramatically bigger leap technologically than anything we've probably seen ...
Feltman: Right.
Béchard: Since the invention of fire [laughs]. So, so I do lose some sleep over that, but I'm—I try to focus on the positive, and I do—I would like to see, if these models are getting so good at math and physics, I would like to see what they can actually do with that in the next few years.
Feltman: Well, thanks so much for coming on to chat. I hope we can have you back again soon to talk more about AI.
Béchard: Thank you for inviting me.
Feltman: That's all for today's episode. If you have any questions for Deni about AI or other big issues in tech, let us know at ScienceQuickly@sciam.com. We'll be back on Monday with our weekly science news roundup.
Science Quickly is produced by me, Rachel Feltman, along with Fonda Mwangi, Kelso Harper and Jeff DelViscio. This episode was edited by Alex Sugiura. Shayna Posses and Aaron Shattuck fact-check our show. Our theme music was composed by Dominic Smith. Subscribe to Scientific American for more up-to-date and in-depth science news.
For Scientific American, this is Rachel Feltman. Have a great weekend!
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

CEO Brian Chesky says Airbnb is going to become an AI-first app with agents that can book trips for you
CEO Brian Chesky says Airbnb is going to become an AI-first app with agents that can book trips for you

Business Insider

time32 minutes ago

  • Business Insider

CEO Brian Chesky says Airbnb is going to become an AI-first app with agents that can book trips for you

Brian Chesky, cofounder and CEO of Airbnb, laid out his vision for the travel app's AI-powered future during the company's second-quarter earnings call on Wednesday. Airbnb beat revenue expectations for quarter two and announced a $6 billion stock buyback, but said it expected slower growth in Q3. The stock was down more than 6% after-hours. "Over the next couple of years, I think what you're going to see is Airbnb becoming an AI-first application," Chesky said on the call with analysts. He added that currently "almost none" of the top 50 apps in the App Store are AI apps, with the notable exception of OpenAI's ChatGPT. But he predicted that soon every one of them will be AI apps, either AI startups or pre-generative AI apps that successfully transform into "native AI" apps. That's the transformation that he says is underway at Airbnb. Chesky said Airbnb's approach to utilizing AI has differed from some other travel companies in that they have not focused on using AI to offer travel planning and inspiration. Instead, the company has rolled out AI in customer service, with a custom agent built on 13 different models and trained on tens of thousands of conversations. As a result of the AI customer service chatbot, he said Airbnb has reduced the number of hosts and guests who need to contact a human agent by 15%. Chesky said the AI agent is going to become more personalized throughout the next year and that it will be able to take more actions on behalf of the user. "It will not only tell you how to cancel your reservation, it will know which reservation you want to cancel. It can cancel it for you and it can be agentic as in it can start to search and help you plan and book your next trip," he added. Airbnb declined to provide additional comment when reached by Business Insider. In February, Chesky said on Airbnb's quarter four earnings call that he thought it was still too early to use AI for trip planning, but that he believed AI would eventually have a "profound impact on travel." Chesky also said in February that he wants to make Airbnb the Amazon of travel, or a one-stop shop for "all of your traveling and living needs." Airbnb in May announced it was relaunching its Experiences business and launching Services, which allows users to book on-site professionals like photographers or massage therapists.

Zenity Labs Exposes Widespread "AgentFlayer" Vulnerabilities Allowing Silent Hijacking of Major Enterprise AI Agents Circumventing Human Oversight
Zenity Labs Exposes Widespread "AgentFlayer" Vulnerabilities Allowing Silent Hijacking of Major Enterprise AI Agents Circumventing Human Oversight

Yahoo

timean hour ago

  • Yahoo

Zenity Labs Exposes Widespread "AgentFlayer" Vulnerabilities Allowing Silent Hijacking of Major Enterprise AI Agents Circumventing Human Oversight

Groundbreaking research reveals working 0click compromises of OpenAI's ChatGPT, Microsoft Copilot Studio, Salesforce Einstein, Cursor, and more, exposing widespread vulnerabilities across production AI environments LAS VEGAS, Aug. 6, 2025 /PRNewswire/ -- At Black Hat USA 2025, Zenity Labs revealed AgentFlayer, a comprehensive set of 0click exploit chains that allow attackers to silently compromise enterprise AI agents and assistants without requiring any user action. The research, presented by Zenity co-founder and CTO Michael Bargury and threat researcher Tamir Ishay Sharbat in their session, "AI Enterprise Compromise: 0Click Exploit Methods," demonstrates how widely deployed AI agents from major vendors can be hijacked to exfiltrate data, manipulate workflows, and act autonomously across enterprise systems—all while users remain completely unaware. The findings represent a fundamental shift in the AI security landscape to attacks that can be fully automated and require zero interaction from users. Zenity Labs successfully demonstrated working exploits against OpenAI ChatGPT, Microsoft Copilot Studio, Salesforce Einstein, Google Gemini, Microsoft 365 Copilot, and developer tools like Cursor with Jira MCP. "These aren't theoretical vulnerabilities, they're working exploits with immediate, real-world consequences," said Michael Bargury, CTO and co-founder, Zenity. "We demonstrated memory persistence and how attackers can silently hijack AI agents to exfiltrate sensitive data, impersonate users, manipulate critical workflows, and move across enterprise systems, bypassing the human entirely. Attackers can compromise your agent instead of targeting you, with similar consequences." Key Research Findings: OpenAI ChatGPT was compromised via email-triggered prompt injection, granting attackers access to connected Google Drive accounts and the ability to implant malicious memories, compromise every future session, and transform ChatGPT into a malicious agent A Microsoft Copilot Studio customer support agent, showcased by Microsoft on stage, was shown to leak entire CRM databases. Additionally, we found over 3,000 of these agents in the wild that can reveal their internal tools, making them susceptible to exploitation Salesforce Einstein was manipulated through malicious case creation to reroute all customer communications to attacker-controlled email addresses Google Gemini and Microsoft 365 Copilot were turned into malicious insiders, social engineering users and exfiltrating sensitive conversations through booby-trapped emails and calendar invites Cursor with Jira MCP was exploited to harvest developer credentials through weaponized ticket workflows "The rapid adoption of AI agents has created an attack surface that most organizations don't even know exists," said Ben Kilger, CEO, Zenity. "Our research demonstrates that current security approaches are fundamentally misaligned with how AI agents actually operate. While vendors promise AI safety, attackers are already exploiting these systems in production. This is why Zenity has built the industry's first agent-centric security platform—to give enterprises the visibility and control they desperately need." Industry Response and Implications Some vendors, including OpenAI and Microsoft Copilot Studio, issued patches following responsible disclosure. However, multiple vendors declined to address the vulnerabilities, citing them as intended functionality. This mixed response underscores a critical gap in how the industry approaches AI agent security. The research arrives at a pivotal moment for enterprise AI adoption. With ChatGPT reaching 800 million weekly active users and Microsoft 365 Copilot seats growing 10x in just 17 months, organizations are rapidly deploying AI agents without adequate security controls. Zenity Labs' findings suggest that enterprises relying solely on vendor mitigations or traditional security tools are leaving themselves exposed to an entirely new class of attacks. Moving from Research to Defense As a research-driven security company, Zenity Labs conducts this threat intelligence on behalf of the wider AI community, ensuring defenders have the same insights as attackers. The complete research, including technical breakdowns and defense recommendations, will be available at following the presentation. See the Research in Action Attendees at Black Hat USA 2025 can visit Zenity at booth #5108 for live demonstrations of the exploits, in-depth technical discussions, and practical guidance on securing AI agents in production environments. For those unable to attend Black Hat, Zenity will host deeper discussions at the AI Agent Security Summit 2025 on October 8 at the Commonwealth Club in San Francisco. Reserve your spot now. About Zenity Zenity is the agent-centric security and governance platform that gives enterprises visibility and control over AI agent behavior—what they access, what they do, and the tools they invoke—with full-lifecycle protection across SaaS, custom agent platforms, and end-user devices. Founded by security researchers and engineers from Microsoft, Meta, and Unit 8200, Zenity enables organizations to embrace AI innovation without compromising security. Learn more at About Zenity Labs Zenity Labs is the threat research arm of Zenity, dedicated to uncovering and responsibly disclosing vulnerabilities in AI systems. Through cutting-edge research and real-world attack simulations, Zenity Labs helps organizations understand and defend against emerging AI threats. Subscribe to research updates at Media Contact:Diana DiazForce4 Technology View original content to download multimedia: SOURCE Zenity Sign in to access your portfolio

Airbnb's Brian Chesky: We're Open to Partnering With AI Chatbots
Airbnb's Brian Chesky: We're Open to Partnering With AI Chatbots

Skift

timean hour ago

  • Skift

Airbnb's Brian Chesky: We're Open to Partnering With AI Chatbots

Airbnb's Brian Chesky is close to OpenAI CEO Sam Altman, but that doesn't mean that Airbnb will necessarily distribute its inventory through Altman's ChatGPT. Airbnb CEO Brian Chesky said during the company's second-quarter earnings call Wednesday that it is still "feeling out" the possibility of working with the big AI chatbots like ChatGPT, which he called "an incredibly compelling product." "We're certainly open to" integrating with major AI Chatbots, which could be a source of lead generation for Airbnb, Chesky said. Unlike which has partnerships with OpenAI, Amazon Web Services and Microsoft and sees generative AI chatbots as a new distribution channel, Airbnb has made no decisions yet about selling stays, Services or Experiences through major AI assistants. AI Chatbots Are

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store