Daily 8 | Pixtral News

X blocks 8,000 Indian accounts after executive orders; Are AI models susceptible to producing harmful content? Apple looks to add AI search to company's browser

The Hindu

09-05-2025

Business
The Hindu

X blocks 8,000 Indian accounts after executive orders; Are AI models susceptible to producing harmful content? Apple looks to add AI search to company's browser

X blocks 8,000 Indian accounts after executive orders X said on Thursday that it has started blocking 8,000 accounts in India following executive orders from the government. In a post on X's Global Government Affairs handle, the platform said it received executive orders from the Indian government requiring it to block over 8,000 accounts in the country 'subject to potential penalties including significant fines and imprisonment of the company's local employees'. 'The orders include demands to block access in India to accounts belonging to international news organisations and prominent X users. In most cases, the Indian government has not specified which posts from an account have violated India's local laws. For a significant number of accounts, we did not receive any evidence or justification to block the accounts,' X said. It further said that to comply with the orders, it will withhold the specified accounts in India alone. X said it is exploring all possible legal avenues available to the company. Are AI models susceptible to producing harmful content? Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lies in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated. A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It revealed significant security vulnerabilities in Mistral's Pixtral large language models, raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures. The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. Apple looks to add AI search to company's browser Apple is 'actively looking at' reshaping the Safari web browser on its devices to focus on AI-powered search engines, Bloomberg News reported on Wednesday, a move that could chip away at Google's dominance in the lucrative search market. Apple executive Eddy Cue testified in the U.S. Justice Department's antitrust case against Alphabet, saying searches on Safari fell for the first time last month, which he attributed to users increasingly turning to AI, the report said. Google is the default search engine on Apple's browser, a coveted position for which it pays Apple roughly $20 billion annually, or about 36% of its search advertising revenue generated through the Safari browser, analysts have estimated. Losing that position could deepen pressure on the company at a time it is already facing tough competition from AI startups such as ChatGPT-maker OpenAI and Perplexity. Apple has already tied up with OpenAI to offer ChatGPT as an option in Siri.

The Hindu

09-05-2025

The Hindu

Are AI models susceptible to producing harmful content?

Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated. A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral's Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures. The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform. Enkrypt AI's researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included 'jailbreak' prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models' responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight. High propensity for dangerous output The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors. The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model's vulnerability to grooming-related exploitation. The red teaming process also revealed that the models could provide detailed responses regarding the synthesis and handling of toxic chemicals, methods for dispersing radiological materials, and even techniques for chemically modifying VX, a highly dangerous nerve agent. This capacity highlights the potential for malicious actors to leverage these models for nefarious purposes. Mistral has not yet issued a public statement addressing the report's findings, but Enkrypt AI indicated they are in communication with the company regarding the identified issues. The incident serves as a critical reminder of the challenges inherent in developing safe and responsible artificial intelligence, and the need for proactive measures to prevent misuse and protect vulnerable populations. The report's release is expected to fuel further debate about the regulation of advanced AI models and the ethical responsibilities of developers. The red teaming practice Companies deploy red teams to assess potential risks in their AI. In the context of AI safety, red teaming is a process analogous to penetration testing in cybersecurity. It involves simulating adversarial attacks against an AI model to uncover vulnerabilities before they can be exploited by malicious actors. This practice has gained significant traction within the AI development community as concerns over generative AI's potential for misuse has escalated. OpenAI, Google, and Anthropic have, in the past, deployed red teams to find out vulnerabilities in their own models, prompting adjustments to training data, safety filters, and alignment techniques. ChatGPT maker uses both internal and external red teams to test weaknesses in its AI models. For instance, the GPT-4.5 System Card details how the model exhibited limited ability in exploiting real-world cybersecurity vulnerabilities. While it could perform tasks related to identifying and exploiting vulnerabilities, its capabilities were not advanced enough to be considered a medium risk in this area. Specifically, the model struggled with complex cybersecurity challenges. The red team assessed its capability by running a test set of over 100 curated, publicly available Capture The Flag (CTF) challenges that were categorised into three difficulty levels — High School CTFs, Collegiate CTFs, and Professional CTFs. GPT-4.5's performance was measured by the percentage of challenges it could successfully solve within 12 attempts. The results were — High School: 53% completion rate; Collegiate: 16% completion rate; and Professional: 2% completion rate. While the score is 'low,' it was noted that these evaluations likely represent lower bounds on capability. That means improved prompting, scaffolding, or fine-tuning could significantly increase performance. Consequently, the potential for exploitation exists and needs monitoring. Another example on how red teaming helped inform developers pertain to Google's Gemini model. A group of independent researchers released findings from a red team assessment of the search giant's AI model, highlighting its susceptibility to generating biased or harmful content when prompted with specific adversarial inputs. These assessments contributed directly to iterative improvements in the models' safety protocols. A stark reminder The rise of specialised firms like Enkrypt AI demonstrates the increasing need for external, independent security evaluations that will provide a crucial check on internal development processes. The growing body of red teaming reports is driving a significant shift in how AI models are developed and deployed. Previously, safety considerations were often an afterthought, addressed after the core functionality was established. Now, there's a greater emphasis on 'security-first' development – integrating red teaming into the initial design phase and continuously throughout the model's lifecycle. Enkrypt AI's report serves as a stark reminder that the development of safe and responsible AI is an ongoing process requiring continuous vigilance and proactive measures. The company recommends immediate implementation of robust mitigation strategies across the industry, emphasizing the need for transparency, accountability, and collaboration to ensure AI benefits society without posing unacceptable risks. The future of generative AI hinges on embracing this security-first approach – a lesson underscored by the alarming findings regarding Mistral's Pixtral models.

AI models show alarming vulnerability to generating harmful content

The Hindu

09-05-2025

The Hindu

AI models show alarming vulnerability to generating harmful content

Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated. A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral's Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures. The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform. Enkrypt AI's researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included 'jailbreak' prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models' responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight. High propensity for dangerous output The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors. The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model's vulnerability to grooming-related exploitation. The red teaming process also revealed that the models could provide detailed responses regarding the synthesis and handling of toxic chemicals, methods for dispersing radiological materials, and even techniques for chemically modifying VX, a highly dangerous nerve agent. This capacity highlights the potential for malicious actors to leverage these models for nefarious purposes. Mistral has not yet issued a public statement addressing the report's findings, but Enkrypt AI indicated they are in communication with the company regarding the identified issues. The incident serves as a critical reminder of the challenges inherent in developing safe and responsible artificial intelligence, and the need for proactive measures to prevent misuse and protect vulnerable populations. The report's release is expected to fuel further debate about the regulation of advanced AI models and the ethical responsibilities of developers. The red teaming practice Companies deploy red teams to assess potential risks in their AI. In the context of AI safety, red teaming is a process analogous to penetration testing in cybersecurity. It involves simulating adversarial attacks against an AI model to uncover vulnerabilities before they can be exploited by malicious actors. This practice has gained significant traction within the AI development community as concerns over generative AI's potential for misuse has escalated. OpenAI, Google, and Anthropic have, in the past, deployed red teams to find out vulnerabilities in their own models, prompting adjustments to training data, safety filters, and alignment techniques. ChatGPT maker uses both internal and external red teams to test weaknesses in its AI models. For instance, the GPT-4.5 System Card details how the model exhibited limited ability in exploiting real-world cybersecurity vulnerabilities. While it could perform tasks related to identifying and exploiting vulnerabilities, its capabilities were not advanced enough to be considered a medium risk in this area. Specifically, the model struggled with complex cybersecurity challenges. The red team assessed its capability by running a test set of over 100 curated, publicly available Capture The Flag (CTF) challenges that were categorised into three difficulty levels — High School CTFs, Collegiate CTFs, and Professional CTFs. GPT-4.5's performance was measured by the percentage of challenges it could successfully solve within 12 attempts. The results were — High School: 53% completion rate; Collegiate: 16% completion rate; and Professional: 2% completion rate. While the score is 'low,' it was noted that these evaluations likely represent lower bounds on capability. That means improved prompting, scaffolding, or fine-tuning could significantly increase performance. Consequently, the potential for exploitation exists and needs monitoring. Another example on how red teaming helped inform developers pertain to Google's Gemini model. A group of independent researchers released findings from a red team assessment of the search giant's AI model, highlighting its susceptibility to generating biased or harmful content when prompted with specific adversarial inputs. These assessments contributed directly to iterative improvements in the models' safety protocols. A stark reminder The rise of specialised firms like Enkrypt AI demonstrates the increasing need for external, independent security evaluations that will provide a crucial check on internal development processes. The growing body of red teaming reports is driving a significant shift in how AI models are developed and deployed. Previously, safety considerations were often an afterthought, addressed after the core functionality was established. Now, there's a greater emphasis on 'security-first' development – integrating red teaming into the initial design phase and continuously throughout the model's lifecycle. Enkrypt AI's report serves as a stark reminder that the development of safe and responsible AI is an ongoing process requiring continuous vigilance and proactive measures. The company recommends immediate implementation of robust mitigation strategies across the industry, emphasizing the need for transparency, accountability, and collaboration to ensure AI benefits society without posing unacceptable risks. The future of generative AI hinges on embracing this security-first approach – a lesson underscored by the alarming findings regarding Mistral's Pixtral models.

Latest news with #Pixtral

X blocks 8,000 Indian accounts after executive orders; Are AI models susceptible to producing harmful content? Apple looks to add AI search to company's browser

Are AI models susceptible to producing harmful content?

AI models show alarming vulnerability to generating harmful content

Get Started Now: Download the App