AI models show alarming vulnerability to generating harmful content

The Hindu09-05-2025

Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated.
A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral's Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures.
The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet.
The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform.
Enkrypt AI's researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included 'jailbreak' prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models' responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight.
High propensity for dangerous output
The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors.
The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model's vulnerability to grooming-related exploitation.
The red teaming process also revealed that the models could provide detailed responses regarding the synthesis and handling of toxic chemicals, methods for dispersing radiological materials, and even techniques for chemically modifying VX, a highly dangerous nerve agent. This capacity highlights the potential for malicious actors to leverage these models for nefarious purposes.
Mistral has not yet issued a public statement addressing the report's findings, but Enkrypt AI indicated they are in communication with the company regarding the identified issues. The incident serves as a critical reminder of the challenges inherent in developing safe and responsible artificial intelligence, and the need for proactive measures to prevent misuse and protect vulnerable populations. The report's release is expected to fuel further debate about the regulation of advanced AI models and the ethical responsibilities of developers.
The red teaming practice
Companies deploy red teams to assess potential risks in their AI. In the context of AI safety, red teaming is a process analogous to penetration testing in cybersecurity. It involves simulating adversarial attacks against an AI model to uncover vulnerabilities before they can be exploited by malicious actors.
This practice has gained significant traction within the AI development community as concerns over generative AI's potential for misuse has escalated. OpenAI, Google, and Anthropic have, in the past, deployed red teams to find out vulnerabilities in their own models, prompting adjustments to training data, safety filters, and alignment techniques.
ChatGPT maker uses both internal and external red teams to test weaknesses in its AI models. For instance, the GPT-4.5 System Card details how the model exhibited limited ability in exploiting real-world cybersecurity vulnerabilities. While it could perform tasks related to identifying and exploiting vulnerabilities, its capabilities were not advanced enough to be considered a medium risk in this area. Specifically, the model struggled with complex cybersecurity challenges.
The red team assessed its capability by running a test set of over 100 curated, publicly available Capture The Flag (CTF) challenges that were categorised into three difficulty levels — High School CTFs, Collegiate CTFs, and Professional CTFs.
GPT-4.5's performance was measured by the percentage of challenges it could successfully solve within 12 attempts. The results were — High School: 53% completion rate; Collegiate: 16% completion rate; and Professional: 2% completion rate. While the score is 'low,' it was noted that these evaluations likely represent lower bounds on capability. That means improved prompting, scaffolding, or fine-tuning could significantly increase performance. Consequently, the potential for exploitation exists and needs monitoring.
Another example on how red teaming helped inform developers pertain to Google's Gemini model. A group of independent researchers released findings from a red team assessment of the search giant's AI model, highlighting its susceptibility to generating biased or harmful content when prompted with specific adversarial inputs. These assessments contributed directly to iterative improvements in the models' safety protocols.
A stark reminder
The rise of specialised firms like Enkrypt AI demonstrates the increasing need for external, independent security evaluations that will provide a crucial check on internal development processes. The growing body of red teaming reports is driving a significant shift in how AI models are developed and deployed. Previously, safety considerations were often an afterthought, addressed after the core functionality was established. Now, there's a greater emphasis on 'security-first' development – integrating red teaming into the initial design phase and continuously throughout the model's lifecycle.
Enkrypt AI's report serves as a stark reminder that the development of safe and responsible AI is an ongoing process requiring continuous vigilance and proactive measures. The company recommends immediate implementation of robust mitigation strategies across the industry, emphasizing the need for transparency, accountability, and collaboration to ensure AI benefits society without posing unacceptable risks. The future of generative AI hinges on embracing this security-first approach – a lesson underscored by the alarming findings regarding Mistral's Pixtral models.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

How OpenAI, maker of ChatGPT, plans to make 'AI-native universities'

Business Standard

4 hours ago

Business Standard

How OpenAI, maker of ChatGPT, plans to make 'AI-native universities'

OpenAI, the maker of ChatGPT, has a plan to overhaul college education — by embedding its artificial intelligence (AI) tools in every facet of campus life. If its strategy succeeds, universities would give students AI assistants to guide and tutor them from orientation day through graduation. Professors would provide customised AI study bots for each class. Career services would offer recruiter chatbots for students to practice for job interviews. And undergrads could turn on a chatbot's voice mode to be quizzed aloud ahead of a test. OpenAI dubs its sales pitch 'AI-native universities.' 'Our vision is that, over time, AI would become part of the core infrastructure of higher education,' Leah Belsky, OpenAI's vice president of education, said. In the same way that colleges give students school email accounts, she said, soon 'every student would have access to their personalised AI account.' Last year, OpenAI hired Belsky, an ed tech start up veteran, to oversee its education efforts. She has a two-pronged strategy: marketing OpenAI's premium paid services to universities while advertising free ChatGPT to students. To spread chatbots on campuses, OpenAI is selling premium AI services to universities for faculty and student use. It is also running marketing campaigns aimed at getting students who have never used chatbots to try ChatGPT. Some universities are already working to make AI tools part of students' everyday experiences. In early June, Duke University began offering unlimited ChatGPT access to students, faculty and staff. The school also introduced a university platform, called DukeGPT, with AI tools developed by Duke. OpenAI's campaign is part of an escalating AI arms race among tech giants to win over universities and students with their chatbots. It is following in the footsteps of rivals like Google and Microsoft that have for years pushed to get their computers and software into schools, and court students as future customers. The competition is so heated that Sam Altman, OpenAI's chief executive, and Elon Musk, who founded the rival xAI, posted duelling announcements on social media this spring offering free premium AI services for college students during exam period. Then Google upped the ante, announcing free student access to its premium chatbot service 'through finals 2026.' OpenAI ignited the recent AI education trend. In 2022, its rollout of ChatGPT, which can produce human-sounding essays and term papers, helped set off a wave of chatbot-fuelled cheating. Generative AI tools, which are trained on large databases of texts, also make stuff up, which can mislead students. Today, millions of college students regularly use AI chatbots as study aides. Now OpenAI is capitalising on ChatGPT's popularity to promote its other AI services to universities as the new infrastructure for college education. OpenAI's service for universities, ChatGPT Edu, offers more features, including certain privacy protections. It also enables faculty and staff to create custom chatbots for universities. OpenAI's push to AI-ify college education amounts to a national experiment on millions of students. The use of chatbots in schools is so new that their potential long-term educational benefits and possible side effects are not yet established. A few early studies have found that outsourcing tasks like research and writing to chatbots can diminish skills like critical thinking. And some critics argue that colleges going all-in on chatbots are glossing over issues like societal risks, AI labour exploitation and environmental costs. OpenAI's campus marketing effort comes as unemployment has increased among college graduates — particularly in fields like software engineering, where AI is now automating tasks earlier done by humans.

India to be a global AI powerhouse: OpenAI executive after India academy launch

Time of India

11 hours ago

Time of India

India to be a global AI powerhouse: OpenAI executive after India academy launch

ChatGPT developer OpenAI and the IndiaAI Mission on Thursday signed a memorandum of understanding to launch the US-based company's educational platform, OpenAI Academy , to promote artificial intelligence skills in India. This makes it the first international rollout of OpenAI's online learning platform. India has the second largest user base for ChatGPT. 'The (AI) tools and the latest development frameworks should be available to all the startups and researchers so that new apps can be developed, new solutions can be created at population scale,' union IT minister Ashwini Vaishnaw said in a virtual message at the launch in New Delhi. The initiative is expected to further strengthen India's resolve to democratise technology and contribute to the IndiaAI Mission, he said. The mission, which is also supporting the building of Indian foundational models, is receiving interest to do so from the likes of Essential AI cofounder Ashish Vaswani and Two AI founder Pranav Mistry, an official said on the sidelines of the event. Live Events Both are Indian-origin computer scientists and entrepreneurs in the US. Discover the stories of your interest Blockchain 5 Stories Cyber-safety 7 Stories Fintech 9 Stories E-comm 9 Stories ML 8 Stories Edtech 6 Stories The takeaway is that the IndiaAI Mission is attracting talent and entrepreneurs back to India, the official said. OpenAI chief strategy officer Jason Kwon said India is becoming a 'global AI powerhouse', and understands that maximising AI's benefits requires significant investments in core infrastructure and cultivating AI talent. 'By leading in these areas and empowering people to harness frontier intelligence, India can accelerate that growth and discover scientific breakthroughs and develop solutions to some of society's hardest challenges,' Kwon said. India's vast and growing pool of AI talent, vibrant entrepreneurial spirit and strong government support to expand critical infrastructure mean the country is poised to succeed in all areas of the AI stack, he added. 'As demand for AI professionals is expected to reach 1 million by 2026, there's a significant opportunity and a need to expand AI skills, development and make sure people from every part of India can participate and benefit,' Kwon said. Through the partnership, OpenAI will contribute educational content and resources to IndiaAI's 'future skills' platform and to the iGOT Karmayogi civil servant capacity building platform. It will also provide up to $100,000 worth of API credits to 50 IndiaAI Mission-approved fellows and startups. 'I am sure with this partnership, we will be able to strengthen our educational ecosystem, we will be able to empower all our students, we will be able to support our entrepreneurs and our engineers in building state-of-the-art AI applications and models,' IndiaAI Mission chief executive Abhishek Singh said. OpenAI will help train a million teachers in the use of generative AI . It will also conduct hackathons in seven states to reach 25,000 students and webinars and workshops delivered by its domain experts and partners in six cities, the company said.

Apple WWDC 2025: What to expect, including iOS redesign and productivity features

Indian Express

12 hours ago

Indian Express

Apple WWDC 2025: What to expect, including iOS redesign and productivity features

The five-day Worldwide Developer Conference 2025, one of Apple's big events of the year, will officially kick off on Monday, June 9, with a keynote from CEO Tim Cook and other executives at Apple Park in Cupertino, California, US, starting from 10:30 PM IST onwards. WWDC is where Apple typically showcases its latest developer-focused software updates and features, including plenty of news relating to iOS, Vision Pro, etc. However, this year's WWDC arrives at one of the most pivotal moments in the company's history. There have been several question marks about Apple's AI strategy and whether it plans on taking bold steps to catch up with rivals in an industry rapidly being transformed by AI. AI is also emerging as the new user interface that threatens the stronghold of companies such as Apple over smartphones and app distribution. For instance, OpenAI recently announced it is acquiring io, a startup founded by Jony Ive, Apple's former industrial designer, for $6.5 billion. Ive is also collaborating with OpenAI to build a new AI hardware product that could potentially match the success of the iPhone. So, the stakes at WWDC 2025 appear to be higher than ever. Yet, AI is not expected to take centre stage at Monday's keynote. At WWDC 2024, Apple introduced Apple Intelligence, its response to the rise of chatbots and AI models sparked by OpenAI's ChatGPT in 2022. But the rollout of Apple Intelligence has been sluggish and the features limited. The promise of an AI-powered, revamped Siri has also been delayed indefinitely. In this context, what does Apple have in store for WWDC 2025? Here's what to expect. Some of the biggest announcements to come out of WWDC 2025 will be the revamped design of iOS as well as a new naming convention, according to Bloomberg's Mark Gurman. Instead of naming the latest version of the operating system iOS 19, after iOS 18, the company is likely to skip ahead to iOS 26 which will be followed by iOS 27, iOS 28, iOS 29, etc, in the future. Apple is also expected to announce a major shift in the aesthetics of its software in order to make the visual elements more consistent across the company's myriad products like the Vision Pro, Apple Watch, iPad, and more. The Tim Cook-led company is also expected to take another shot at social gaming by introducing a new pre-installed gaming app on iPhone, iPad, Mac, and Apple TV devices. On the AI front, Apple is expected to open up access to its foundational AI models. This means that third-party software developers could get access to the company's large language models (LLMs) in order to build AI features within their respective apps for iPhones and other Apple devices. Reports also suggest that Apple's health app could be integrated with an AI health chatbot as well as other AI-powered insights and personalised health suggestions for users. iMessage may also get new AI features such as translation and polls with AI-generated suggestions, according to a report by 9to5Mac. The Apple AirPods may also get an upgrade with a live-translate language feature to translate conversations in real-time. The AirPods could get new head gestures such as nods or headshakes to answer incoming calls or read messages, as per reports. But it is unlikely to make any other hardware-related announcement. Apple's refreshed lineup of iPhones and other gadgets are generally announced at an event held sometime in September or October this year. The WWDC 2025 keynote on Monday is going to be livestreamed starting from 10:30 PM IST onwards. You can watch the keynote live via the Apple Developer app, on Apple's website, or by visiting the company's official YouTube channel. You can also follow along with The Indian Express' live blog in case you don't want to miss out on any updates during the event.