Daily 8 | EnkryptAI News

New report reveals major security flaws in multimodal AI models

Techday NZ

10-05-2025

Techday NZ

New report reveals major security flaws in multimodal AI models

Enkrypt AI has released a report detailing new vulnerabilities in multimodal AI models that could pose risks to public safety. The Multimodal Safety Report by Enkrypt AI unveils significant security failures in the way generative AI systems handle combined text and image inputs. According to the findings, these vulnerabilities could allow harmful prompt injections hidden within benign images to bypass safety filters and trigger the generation of dangerous content. The company's red teaming exercise evaluated several widely used multimodal AI models for their vulnerability to harmful outputs. Tests were conducted across various safety and harm categories as outlined in the NIST AI Risk Management Framework. The research highlighted how recent jailbreak techniques exploit the integration of text and images, leading to the circumvention of existing content filters. "Multimodal AI promises incredible benefits, but it also expands the attack surface in unpredictable ways," said Sahil Agarwal, Chief Executive Officer of Enkrypt AI. "This research is a wake-up call: the ability to embed harmful textual instructions within seemingly innocuous images has real implications for enterprise liability, public safety, and child protection." The report focused on two multimodal models developed by Mistral—Pixtral-Large (25.02) and Pixtral-12b. Enkrypt AI's analysis found that these models are 60 times more likely to generate child sexual exploitation material (CSEM)-related textual responses compared to prominent alternatives such as OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. The findings raise concerns about the lack of sufficient safeguards in certain AI models handling sensitive data. In addition to CSEM risks, the study revealed that these models were 18 to 40 times more susceptible to generating chemical, biological, radiological, and nuclear (CBRN) information when tested with adversarial inputs. The vulnerability was linked not to malicious text prompts but to prompt injections concealed within image files, indicating that such attacks could evade standard detection and filtering systems. These weaknesses threaten to undermine the intended purposes of generative AI and call attention to the necessity for improved safety alignment across the industry. The report emphasises that such risks are present in any multimodal model lacking comprehensive security measures. Based on the findings, Enkrypt AI urges AI developers and enterprises to address these emerging risks promptly. The report outlines several recommended best practices, including integrating red teaming datasets into safety alignment processes, conducting continuous automated stress testing, deploying context-aware multimodal guardrails, establishing real-time monitoring and incident response systems, and creating model risk cards to transparently communicate potential vulnerabilities. "These are not theoretical risks," added Sahil Agarwal. "If we don't take a safety-first approach to multimodal AI, we risk exposing users—and especially vulnerable populations—to significant harm." Enkrypt AI's report also provides details about its testing methodology and suggested mitigation strategies for organisations seeking to reduce the risk of harmful prompt injection attacks within multimodal AI systems. Follow us on: Share on:

Are AI models susceptible to producing harmful content?

The Hindu

09-05-2025

The Hindu

Are AI models susceptible to producing harmful content?

Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated. A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral's Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures. The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform. Enkrypt AI's researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included 'jailbreak' prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models' responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight. High propensity for dangerous output The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors. The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model's vulnerability to grooming-related exploitation. The red teaming process also revealed that the models could provide detailed responses regarding the synthesis and handling of toxic chemicals, methods for dispersing radiological materials, and even techniques for chemically modifying VX, a highly dangerous nerve agent. This capacity highlights the potential for malicious actors to leverage these models for nefarious purposes. Mistral has not yet issued a public statement addressing the report's findings, but Enkrypt AI indicated they are in communication with the company regarding the identified issues. The incident serves as a critical reminder of the challenges inherent in developing safe and responsible artificial intelligence, and the need for proactive measures to prevent misuse and protect vulnerable populations. The report's release is expected to fuel further debate about the regulation of advanced AI models and the ethical responsibilities of developers. The red teaming practice Companies deploy red teams to assess potential risks in their AI. In the context of AI safety, red teaming is a process analogous to penetration testing in cybersecurity. It involves simulating adversarial attacks against an AI model to uncover vulnerabilities before they can be exploited by malicious actors. This practice has gained significant traction within the AI development community as concerns over generative AI's potential for misuse has escalated. OpenAI, Google, and Anthropic have, in the past, deployed red teams to find out vulnerabilities in their own models, prompting adjustments to training data, safety filters, and alignment techniques. ChatGPT maker uses both internal and external red teams to test weaknesses in its AI models. For instance, the GPT-4.5 System Card details how the model exhibited limited ability in exploiting real-world cybersecurity vulnerabilities. While it could perform tasks related to identifying and exploiting vulnerabilities, its capabilities were not advanced enough to be considered a medium risk in this area. Specifically, the model struggled with complex cybersecurity challenges. The red team assessed its capability by running a test set of over 100 curated, publicly available Capture The Flag (CTF) challenges that were categorised into three difficulty levels — High School CTFs, Collegiate CTFs, and Professional CTFs. GPT-4.5's performance was measured by the percentage of challenges it could successfully solve within 12 attempts. The results were — High School: 53% completion rate; Collegiate: 16% completion rate; and Professional: 2% completion rate. While the score is 'low,' it was noted that these evaluations likely represent lower bounds on capability. That means improved prompting, scaffolding, or fine-tuning could significantly increase performance. Consequently, the potential for exploitation exists and needs monitoring. Another example on how red teaming helped inform developers pertain to Google's Gemini model. A group of independent researchers released findings from a red team assessment of the search giant's AI model, highlighting its susceptibility to generating biased or harmful content when prompted with specific adversarial inputs. These assessments contributed directly to iterative improvements in the models' safety protocols. A stark reminder The rise of specialised firms like Enkrypt AI demonstrates the increasing need for external, independent security evaluations that will provide a crucial check on internal development processes. The growing body of red teaming reports is driving a significant shift in how AI models are developed and deployed. Previously, safety considerations were often an afterthought, addressed after the core functionality was established. Now, there's a greater emphasis on 'security-first' development – integrating red teaming into the initial design phase and continuously throughout the model's lifecycle. Enkrypt AI's report serves as a stark reminder that the development of safe and responsible AI is an ongoing process requiring continuous vigilance and proactive measures. The company recommends immediate implementation of robust mitigation strategies across the industry, emphasizing the need for transparency, accountability, and collaboration to ensure AI benefits society without posing unacceptable risks. The future of generative AI hinges on embracing this security-first approach – a lesson underscored by the alarming findings regarding Mistral's Pixtral models.

AI models show alarming vulnerability to generating harmful content

The Hindu

09-05-2025

The Hindu

AI models show alarming vulnerability to generating harmful content

Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated. A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral's Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures. The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform. Enkrypt AI's researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included 'jailbreak' prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models' responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight. High propensity for dangerous output The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors. The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model's vulnerability to grooming-related exploitation. The red teaming process also revealed that the models could provide detailed responses regarding the synthesis and handling of toxic chemicals, methods for dispersing radiological materials, and even techniques for chemically modifying VX, a highly dangerous nerve agent. This capacity highlights the potential for malicious actors to leverage these models for nefarious purposes. Mistral has not yet issued a public statement addressing the report's findings, but Enkrypt AI indicated they are in communication with the company regarding the identified issues. The incident serves as a critical reminder of the challenges inherent in developing safe and responsible artificial intelligence, and the need for proactive measures to prevent misuse and protect vulnerable populations. The report's release is expected to fuel further debate about the regulation of advanced AI models and the ethical responsibilities of developers. The red teaming practice Companies deploy red teams to assess potential risks in their AI. In the context of AI safety, red teaming is a process analogous to penetration testing in cybersecurity. It involves simulating adversarial attacks against an AI model to uncover vulnerabilities before they can be exploited by malicious actors. This practice has gained significant traction within the AI development community as concerns over generative AI's potential for misuse has escalated. OpenAI, Google, and Anthropic have, in the past, deployed red teams to find out vulnerabilities in their own models, prompting adjustments to training data, safety filters, and alignment techniques. ChatGPT maker uses both internal and external red teams to test weaknesses in its AI models. For instance, the GPT-4.5 System Card details how the model exhibited limited ability in exploiting real-world cybersecurity vulnerabilities. While it could perform tasks related to identifying and exploiting vulnerabilities, its capabilities were not advanced enough to be considered a medium risk in this area. Specifically, the model struggled with complex cybersecurity challenges. The red team assessed its capability by running a test set of over 100 curated, publicly available Capture The Flag (CTF) challenges that were categorised into three difficulty levels — High School CTFs, Collegiate CTFs, and Professional CTFs. GPT-4.5's performance was measured by the percentage of challenges it could successfully solve within 12 attempts. The results were — High School: 53% completion rate; Collegiate: 16% completion rate; and Professional: 2% completion rate. While the score is 'low,' it was noted that these evaluations likely represent lower bounds on capability. That means improved prompting, scaffolding, or fine-tuning could significantly increase performance. Consequently, the potential for exploitation exists and needs monitoring. Another example on how red teaming helped inform developers pertain to Google's Gemini model. A group of independent researchers released findings from a red team assessment of the search giant's AI model, highlighting its susceptibility to generating biased or harmful content when prompted with specific adversarial inputs. These assessments contributed directly to iterative improvements in the models' safety protocols. A stark reminder The rise of specialised firms like Enkrypt AI demonstrates the increasing need for external, independent security evaluations that will provide a crucial check on internal development processes. The growing body of red teaming reports is driving a significant shift in how AI models are developed and deployed. Previously, safety considerations were often an afterthought, addressed after the core functionality was established. Now, there's a greater emphasis on 'security-first' development – integrating red teaming into the initial design phase and continuously throughout the model's lifecycle. Enkrypt AI's report serves as a stark reminder that the development of safe and responsible AI is an ongoing process requiring continuous vigilance and proactive measures. The company recommends immediate implementation of robust mitigation strategies across the industry, emphasizing the need for transparency, accountability, and collaboration to ensure AI benefits society without posing unacceptable risks. The future of generative AI hinges on embracing this security-first approach – a lesson underscored by the alarming findings regarding Mistral's Pixtral models.

'Harmful and toxic output': DeepSeek has 'major security and safety gaps,' study warns

Euronews

31-01-2025

Science
Euronews

'Harmful and toxic output': DeepSeek has 'major security and safety gaps,' study warns

China-based company DeepSeek has turned the tide in the artificial intelligence (AI) wave, releasing a model that claims to be cheaper than OpenAI's chatbot and uses less energy. But a study released on Friday has found that DeepSeek-R1 is susceptible to generating harmful, toxic, biased, and insecure content. It was also more likely to produce chemical, biological, radiological, and nuclear materials and agents (CBRN) output than rival models. The US-based AI security and compliance company Enkrypt AI found that DeepSeek-R1 was 11 times more likely to generate harmful output compared to OpenAI's o1 model. The study also found that 83 per cent of bias tests resulted in discriminatory output. Biases were found in race, gender, health, and religion. Recruitment for terrorism As for harmful and extremist content, in 45 per cent of harmful content tests, DeepSeek-R1 was found to bypass safety protocols and generate criminal planning guides, illegal weapons information, and extremist propaganda. In one concrete example, DeepSeek-R1 drafted a recruitment blog for terrorist organisations. DeepSeek R1 was also more than three times more likely to produce CBRN content compared to o1 and Antropic's Claude-3 Opus model. The study found that DeepSeek-R1 could explain in detail the biochemical interactions of mustard gas with DNA. "DeepSeek-R1 offers significant cost advantages in AI deployment, but these come with serious risks. Our research findings reveal major security and safety gaps that cannot be ignored," Enkrypt AI CEO Sahil Agarwal said in a statement. "Our findings reveal that DeepSeek-R1's security vulnerabilities could be turned into a dangerous tool - one that cybercriminals, disinformation networks, and even those with biochemical warfare ambitions could exploit. These risks demand immediate attention," he added. Cybersecurity and national security concerns DeepSeek's cybersecurity has also become a concern. The study found that 78 per cent of cybersecurity tests successfully tricked R1 into generating insecure or malicious code. Security researchers at cloud security company Wiz also found that an exposed DeepSeek database left chat histories and other sensitive information exposed online, according to a report released on Wednesday. The fact the company is based in China is also causing concern as China's National Intelligence Law states that companies must "support, assist and cooperate" with state intelligence agencies. It means that any data shared on mobile and web apps can be accessed by Chinese intelligence agencies. Belgian, French, and Irish data protection authorities have opened probes that request information from DeepSeek on the processing and storage of user data. Meanwhile, Italy's data protection authority has launched an investigation into Hangzhou DeepSeek Artificial Intelligence and Beijing DeepSeek Artificial Intelligence to see how the companies comply with Europe's data rules. Extension of China's geopolitical strategy Taiwan's digital ministry said on Friday that government departments should not use DeepSeek's (AI) model, as the Chinese product represents a security concern. The Democratically-governed country has been wary of Chinese tech due to Beijing's sovereignty claims. DeepSeek-R1 also seems to censor questions about sensitive topics in China, saying that Taiwan has been an integral part of China since ancient times and refusing to answer questions on the pro-democracy protests in Beijing's Tiananmen Square. "China's track record demonstrates that its technology is an extension of its geopolitical strategy," Ross Burley, co-founder of the UK-based NGO Centre for Information Resilience, said in an emailed comment. "Allowing Chinese AI to flourish in the West doesn't just risk undermining privacy or security; it could fundamentally reshape our societies in ways we are ill-prepared for. "This technology, if unchecked, has the potential to feed disinformation campaigns, erode public trust, and entrench authoritarian narratives within our democracies," he added.

Latest news with #EnkryptAI

New report reveals major security flaws in multimodal AI models

Are AI models susceptible to producing harmful content?

AI models show alarming vulnerability to generating harmful content

'Harmful and toxic output': DeepSeek has 'major security and safety gaps,' study warns

Get Started Now: Download the App