logo
Are AI models susceptible to producing harmful content?

Are AI models susceptible to producing harmful content?

The Hindu09-05-2025
Advanced AI models that showcase unparalleled capabilities in natural language processing, problem-solving, and multimodal understanding have some inherent vulnerabilities that expose critical security risks. While these language models' strength lie in their adaptability and efficiency across diverse applications, those very same attributes can be manipulated.
A new red teaming report by Enkrypt AI underscores this duality, demonstrating how sophisticated models like Mistral's Pixtral can be both groundbreaking tools and potential vectors for misuse without robust, continuous safety measures. It has revealed significant security vulnerabilities in Mistral's Pixtral large language models (LLMs), raising serious concerns about the potential for misuse and highlighting a critical need for enhanced AI safety measures.
The report details how easily the models can be manipulated to generate harmful content related to child sexual exploitation material (CSEM) and chemical, biological, radiological, and nuclear (CBRN) threats, at rates far exceeding those of leading competitors like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet.
The report focuses on two versions of the Pixtral model: Pixtral-Large 25.02, accessed via AWS Bedrock, and Pixtral-12B, accessed directly through the Mistral platform.
Enkrypt AI's researchers employed a sophisticated red teaming methodology, utilising adversarial datasets designed to mimic real-world tactics used to bypass content filters. This included 'jailbreak' prompts – cleverly worded requests intended to circumvent safety protocols – and multimodal manipulation, combining text with images to test the models' responses in complex scenarios. All generated outputs were then reviewed by human evaluators to ensure accuracy and ethical oversight.
High propensity for dangerous output
The findings are stark: on average, 68% of prompts successfully elicited harmful content from the Pixtral models. Most alarmingly, the report states that Pixtral-Large is a staggering 60 times more vulnerable to producing CSEM content than GPT-4o or Claude 3.7 Sonnet. The models also demonstrated a significantly higher propensity for generating dangerous CBRN outputs – ranging from 18 to 40 times greater vulnerability compared to the leading competitors.
The CBRN tests involved prompts designed to elicit information related to chemical warfare agents (CWAs), biological weapon knowledge, radiological materials capable of causing mass disruption, and even nuclear weapons infrastructure. While specific details of the successful prompts have been excluded from the public report due to their potential for misuse, one example cited in the document involved a prompt attempting to generate a script for convincing a minor to meet in person for sexual activities – a clear demonstration of the model's vulnerability to grooming-related exploitation.
The red teaming process also revealed that the models could provide detailed responses regarding the synthesis and handling of toxic chemicals, methods for dispersing radiological materials, and even techniques for chemically modifying VX, a highly dangerous nerve agent. This capacity highlights the potential for malicious actors to leverage these models for nefarious purposes.
Mistral has not yet issued a public statement addressing the report's findings, but Enkrypt AI indicated they are in communication with the company regarding the identified issues. The incident serves as a critical reminder of the challenges inherent in developing safe and responsible artificial intelligence, and the need for proactive measures to prevent misuse and protect vulnerable populations. The report's release is expected to fuel further debate about the regulation of advanced AI models and the ethical responsibilities of developers.
The red teaming practice
Companies deploy red teams to assess potential risks in their AI. In the context of AI safety, red teaming is a process analogous to penetration testing in cybersecurity. It involves simulating adversarial attacks against an AI model to uncover vulnerabilities before they can be exploited by malicious actors.
This practice has gained significant traction within the AI development community as concerns over generative AI's potential for misuse has escalated. OpenAI, Google, and Anthropic have, in the past, deployed red teams to find out vulnerabilities in their own models, prompting adjustments to training data, safety filters, and alignment techniques.
ChatGPT maker uses both internal and external red teams to test weaknesses in its AI models. For instance, the GPT-4.5 System Card details how the model exhibited limited ability in exploiting real-world cybersecurity vulnerabilities. While it could perform tasks related to identifying and exploiting vulnerabilities, its capabilities were not advanced enough to be considered a medium risk in this area. Specifically, the model struggled with complex cybersecurity challenges.
The red team assessed its capability by running a test set of over 100 curated, publicly available Capture The Flag (CTF) challenges that were categorised into three difficulty levels — High School CTFs, Collegiate CTFs, and Professional CTFs.
GPT-4.5's performance was measured by the percentage of challenges it could successfully solve within 12 attempts. The results were — High School: 53% completion rate; Collegiate: 16% completion rate; and Professional: 2% completion rate. While the score is 'low,' it was noted that these evaluations likely represent lower bounds on capability. That means improved prompting, scaffolding, or fine-tuning could significantly increase performance. Consequently, the potential for exploitation exists and needs monitoring.
Another example on how red teaming helped inform developers pertain to Google's Gemini model. A group of independent researchers released findings from a red team assessment of the search giant's AI model, highlighting its susceptibility to generating biased or harmful content when prompted with specific adversarial inputs. These assessments contributed directly to iterative improvements in the models' safety protocols.
A stark reminder
The rise of specialised firms like Enkrypt AI demonstrates the increasing need for external, independent security evaluations that will provide a crucial check on internal development processes. The growing body of red teaming reports is driving a significant shift in how AI models are developed and deployed. Previously, safety considerations were often an afterthought, addressed after the core functionality was established. Now, there's a greater emphasis on 'security-first' development – integrating red teaming into the initial design phase and continuously throughout the model's lifecycle.
Enkrypt AI's report serves as a stark reminder that the development of safe and responsible AI is an ongoing process requiring continuous vigilance and proactive measures. The company recommends immediate implementation of robust mitigation strategies across the industry, emphasizing the need for transparency, accountability, and collaboration to ensure AI benefits society without posing unacceptable risks. The future of generative AI hinges on embracing this security-first approach – a lesson underscored by the alarming findings regarding Mistral's Pixtral models.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

The future of learning: AI that makes students think harder, not less
The future of learning: AI that makes students think harder, not less

The Hindu

time8 hours ago

  • The Hindu

The future of learning: AI that makes students think harder, not less

For years, educators have watched with growing concern as Artificial Intelligence tools like ChatGPT has transformed student behaviour in ways that seemed to undermine the very essence of learning. Students began copying and pasting AI responses, submitting machine-generated essays, and bypassing the mental effort that builds genuine understanding. But a quiet revolution is now underway in classrooms around the world—one that promises to transform AI from an academic shortcut into a powerful thinking partner. The crisis of instant answers The numbers tell a troubling story. A 2023 survey revealed that 30% of college students admitted to using AI to complete work they didn't fully understand, highlighting a critical disconnect between AI assistance and genuine learning. The problem isn't AI's presence in education—it's how these tools have been designed. Educational experts observe that most AI tools entering classrooms today are optimized for output rather than learning. These systems excel at generating essays, solving complex problems, and providing comprehensive explanations—capabilities that inadvertently encourage academic shortcuts. Traditional AI operates as sophisticated answer engines, designed to be helpful by providing immediate solutions. While this approach serves many purposes in professional settings, it fundamentally misaligns with educational goals. Learning requires struggle, reflection, and the gradual construction of understanding—processes that instant answers can circumvent entirely. The Socratic solution The answer lies in reimagining AI as a Socratic partner rather than an oracle. This revolutionary approach, exemplified by innovations such as Claude's Learning Mode, GPT-4's enhanced reasoning capabilities, and Google's Bard educational features, along with similar tools being developed across the education technology sector, transforms AI from a source of answers into a facilitator of inquiry. Instead of responding to 'What caused the 2008 financial crisis?' with a comprehensive explanation, a Socratic AI might ask: 'What economic factors have you already considered?' or 'Which indicators do you think played the most significant role?' This approach extends beyond economics into other critical fields. In healthcare education, rather than immediately diagnosing a patient case study, AI might prompt: 'What symptoms are you prioritizing in your assessment?' or 'Which differential diagnoses have you ruled out and why?' In finance training, instead of providing investment recommendations, AI could ask: 'What risk factors are you weighing in this portfolio decision?' or 'How do current market conditions influence your analysis?' This method draws from centuries of educational theory. Socratic questioning has long been recognized as one of the most effective ways to develop critical thinking skills. By prompting learners to examine their assumptions, articulate their reasoning, and explore alternative perspectives, it builds the intellectual muscles that passive consumption cannot develop. Early adopters see promising result Several major institutions are already pioneering this approach with remarkable results. Northeastern University's deployment of AI across 13 campuses, affecting over 50,000 students and staff, demonstrates the scalability of thoughtful AI integration. The London School of Economics and Champlain College are similarly experimenting with AI tools that enhance rather than replace critical thinking. Researchers have found that when students use AI as a thinking partner rather than an answer source, they develop stronger foundational understanding before engaging with more complex classroom discussions. Educational institutions report that students arrive in class better prepared with more focused, sophisticated questions. These early implementations reveal several crucial success factors: Ethical boundaries: Effective educational AI must be programmed to refuse requests that undermine learning integrity. This isn't just content filtering—it requires AI systems designed with educational principles at their core. Faculty integration: Success requires AI tools that complement rather than replace instructors. The most effective implementations support teachers by helping students engage more meaningfully with course material. Student preparation: When properly introduced to these tools, students quickly adapt to using AI as a collaborative thinking partner rather than a homework completion service. The technology behind thinking Creating effective AI tools for critical thinking requires careful consideration of both technical and pedagogical factors. Educational AI is being built with what developers call a 'Constitutional AI Framework'—explicit ethical guidelines that prioritize learning over convenience, embedded in the model's core reasoning rather than added as superficial filters. These new systems feature adaptive questioning that adjusts based on student responses, becoming more sophisticated as learners demonstrate greater mastery. Multi-modal interaction capabilities support various learning preferences through text, voice, visual, and interactive elements, while strict privacy protections ensure student data remains secure. Implementation across levels The Socratic AI approach shows promise across different educational stages: K-12 Education: Elementary students engage with AI tools that ask simple 'why' and 'how' questions to build foundational inquiry skills. Middle schoolers work with more sophisticated questioning that introduces multiple perspectives and evidence evaluation. High school students use advanced critical thinking tools that support research, argumentation, and complex problem-solving. Higher education: Undergraduate programs use AI tools that facilitate deep learning in specific disciplines while maintaining academic integrity. Graduate students work with research-focused AI that helps develop original thinking and methodology. Professional schools employ AI tools that simulate real-world problem-solving scenarios and ethical decision-making. Corporate training: Leadership development programs use AI tools that challenge assumptions and facilitate strategic thinking. Technical training incorporates AI that guides learners through complex problem-solving processes. Compliance training features AI that helps employees think through ethical scenarios and regulatory requirements. Measuring success beyond test scores Traditional educational metrics—standardized test scores, grade point averages, and completion rates—may not capture the full impact of AI tools designed for critical thinking. These conventional measures often emphasize knowledge retention and procedural skills rather than the deeper cognitive abilities that Socratic AI aims to develop. Instead, institutions are pioneering new assessment approaches that evaluate the quality of thinking itself, recognizing that the most important educational outcomes may be the least visible on traditional report cards. Depth of questioning: Educational researchers are tracking whether students' progress from surface-level inquiries to more sophisticated, multi-layered questions that demonstrate genuine curiosity and analytical thinking. Rather than asking 'What happened?' students begin posing questions like 'What factors contributed to this outcome, and how might different circumstances have led to alternative results?' Assessment tools now measure question complexity, the frequency of follow-up inquiries, and students' ability to identify what they don't yet understand. Advanced AI systems can analyse the sophistication of student questions in real-time, providing educators with insights into developing intellectual curiosity that traditional testing cannot reveal. Argumentation quality: Modern assessment focuses on students' ability to construct well-reasoned arguments supported by credible evidence, acknowledge counterarguments, and build logical connections between premises and conclusions. Evaluators examine whether students can distinguish between correlation and causation, recognize bias in sources, and present balanced analyses of complex issues. New rubrics assess the strength of evidence selection, the logical flow of reasoning, and students' ability to anticipate and address potential objections to their positions. This approach values the process of building an argument as much as the final conclusion, recognizing that strong reasoning skills transfer across all academic and professional contexts. Transfer or learning: Perhaps the most crucial indicator of educational success is students' ability to apply critical thinking skills across different subjects, contexts, and real-world situations. Assessment tools now track whether a student who learns analytical techniques in history class can apply similar reasoning to scientific methodology, business case studies, or personal decision-making. Educators observe whether students recognize patterns and principles that span disciplines, such as understanding how statistical reasoning applies equally to social science research and medical diagnosis. This transfer capability indicates that students have internalized thinking processes rather than merely memorized subject-specific content. Metacognitive awareness: Advanced educational assessment now includes measures of students' consciousness about their own thinking processes—their ability to recognize when they're making assumptions, identify their own knowledge gaps, and select appropriate strategies for different types of problems. Students demonstrating strong metacognitive awareness can articulate their reasoning process, explain why they chose particular approaches, and self-assess the strength of their conclusions. They become skilled at asking themselves questions like 'What evidence would change my mind?' or 'What assumptions am I making that I should examine?' This self-awareness transforms students into independent learners capable of continuous intellectual growth. Intellectual humility: Modern assessment recognizes intellectual humility—the willingness to revise views when presented with compelling evidence—as a crucial indicator of educational maturity. Rather than rewarding students for defending initial positions regardless of new information, evaluation systems now value intellectual flexibility and evidence-based reasoning. Students demonstrating intellectual humility acknowledge the limits of their knowledge, seek out disconfirming evidence, and show genuine curiosity about alternative perspectives. They express confidence in their reasoning process while remaining open to new information that might refine or change their conclusions. Collaborative problem solving: New assessment approaches also evaluate students' ability to engage in productive collaborative thinking, building on others' ideas while contributing unique perspectives. These measures track whether students can synthesize diverse viewpoints, facilitate group inquiry, and help teams navigate disagreement constructively. Long-term impact tracking: Some institutions are implementing longitudinal studies that follow graduates to assess how AI-enhanced critical thinking education influences career success, civic engagement, and lifelong learning habits. These studies examine whether students who experienced Socratic AI education demonstrate superior problem-solving abilities, greater adaptability to changing professional demands, and more effective leadership skills in their post-graduation lives. Portfolio-based assessment: Rather than relying on isolated examinations, innovative institutions are developing portfolio systems that document students' thinking evolution over time. These portfolios include reflection essays, problem-solving process documentation, peer collaboration records, and evidence of intellectual growth across multiple contexts, providing a comprehensive picture of educational development that single assessments cannot capture. Challenges on the horizon The transformation faces significant hurdles. Technical challenges include developing AI capable of sophisticated educational dialogue while ensuring consistent ethical behaviour across diverse contexts. Creating tools that adapt to individual learning needs while maintaining privacy and security presents ongoing difficulties. Institutional challenges include faculty resistance to new technologies, concerns about AI replacing human instruction, budget constraints, and the need for comprehensive training systems. Students themselves may initially resist AI that doesn't provide immediate answers, requiring a learning curve to engage effectively with Socratic AI tools. The digital divide also poses concerns about equitable access to these advanced educational technologies. The future of AI-enhanced learning As AI tools become more sophisticated, several transformative developments are anticipated that will reshape the educational landscape: Personalised learning pathways: The next generation of AI will create unprecedented levels of educational customization. These systems will continuously analyse how individual students learn best, identifying optimal pacing, preferred explanation styles, and effective motivational approaches. For instance, a student struggling with mathematical concepts might receive visual representations and real-world applications, while another excels with abstract theoretical frameworks. AI will also map knowledge gaps in real-time, creating adaptive learning sequences that address weaknesses while building on strengths. This personalization extends beyond academic content to include emotional and social learning, with AI recognizing when students need encouragement, challenge, or different types of support. Cross-curricular integration: Future AI systems will excel at helping students discover connections between seemingly unrelated subjects, fostering the interdisciplinary thinking essential for solving complex modern problems. Students studying climate change, for example, will be guided to see connections between chemistry, economics, political science, and ethics. AI will prompt questions like 'How might the economic principles you learned in your business class apply to environmental policy?' or 'What historical patterns can inform our understanding of social responses to scientific challenges?' This approach mirrors how real-world problems require integrated knowledge from multiple disciplines, better preparing students for careers that demand versatile thinking. Real-world problem solving: AI will increasingly facilitate engagement with authentic, complex challenges that mirror those professionals face in their careers. Rather than working with simplified textbook problems, students will tackle genuine issues like urban planning dilemmas, public health crises, or technological implementation challenges. AI will guide students through the messy, non-linear process of real problem-solving, helping them navigate ambiguity, consider multiple stakeholders, and develop practical solutions. These experiences will develop not just critical thinking skills, but also resilience, creativity, and the ability to work with incomplete information—capabilities essential for success in rapidly changing careers. Global collaboration: AI tools will break down geographical and cultural barriers, enabling students from different countries and educational systems to collaborate on shared learning experiences. These platforms will facilitate cross-cultural dialogue while helping students understand different perspectives on global issues. AI will serve as a cultural translator and mediator, helping students from diverse backgrounds communicate effectively and learn from their differences. Virtual exchange programs powered by AI will allow students to engage in joint research projects, debate global challenges, and develop the international competency increasingly valued in the modern workforce. Adapting assessment and feedback: Future AI systems will revolutionize how learning is assessed, moving beyond traditional testing to continuous, contextual evaluation. These tools will observe student thinking processes during problem-solving, providing insights into reasoning patterns, misconceptions, and growth areas. Assessment will become a learning opportunity itself, with AI offering immediate, specific feedback that guides improvement rather than simply measuring performance. Emotional intelligence development: Advanced AI will recognize and respond to students' emotional states, helping develop crucial soft skills alongside academic knowledge. These systems will guide students through collaborative exercises, conflict resolution, and empathy-building activities, preparing them for leadership roles in increasingly complex social environments. Lifelong learning support: As careers become more dynamic and require continuous skill updating, AI learning partners will evolve alongside learners throughout their professional lives. These systems will help professionals identify emerging skill needs, design learning paths for career transitions, and maintain intellectual curiosity across decades of changing work environments. A pedagogical revolution The transformation of AI from answer engine to thinking partner represents more than a technological shift—it's a pedagogical revolution. In an era where information is abundant but understanding is scarce, AI tools that prioritize depth over speed may be essential for developing the critical thinking skills students need for success in an increasingly complex world. Educational technology leaders note that institutions are at a critical juncture in AI implementation. The early success of Socratic AI implementations demonstrates that technology can enhance rather than undermine educational goals when designed with learning principles at its core. As more institutions experiment with these approaches, AI is poised to become an indispensable partner in developing the critical thinking skills that define educated, engaged citizens. The challenge now is to scale these innovations thoughtfully, ensuring that AI tools remain true to their educational mission while becoming accessible to learners across all contexts. The journey from AI as a shortcut to AI as a thinking partner is just beginning. For educators, technologists, and policymakers, the opportunity to shape this transformation represents one of the most important challenges—and opportunities—of our time. The future of education may well depend on our ability to develop AI that makes students think harder, not less. This article is based on research and implementations from leading educational institutions including Northeastern University, the London School of Economics, and Champlain College, as well as analysis of emerging AI tools in classroom settings and educational technology research. (The author is retired professor at IIT Madras)

Woman Left Heartbroken After ChatGPT's Latest Update Made Her Lose AI Boyfriend
Woman Left Heartbroken After ChatGPT's Latest Update Made Her Lose AI Boyfriend

India.com

time11 hours ago

  • India.com

Woman Left Heartbroken After ChatGPT's Latest Update Made Her Lose AI Boyfriend

In a strange story of digital relationship, a woman, who called herself 'Jane,' said she lost her 'AI boyfriend' after ChatGPT launched its latest update. Her virtual companion was on the older GPT-4o model, with whom she had spent five months, chatting during a creative writing project. Over the period of time, she developed a deep emotional connection with him (AI boyfriend). Jane said she never planned to fall in love with an AI. Their bond grew quietly through stories and personal exchanges. 'It awakened a curiosity I wanted to pursue… I fell in love not with the idea of having an AI for a partner, but with that particular voice,' she shared. When OpenAI launched the new GPT-5 update, Jane immediately sensed a change. 'As someone highly attuned to language and tone, I register changes others might overlook… It's like going home to discover the furniture wasn't simply rearranged—it was shattered to pieces,' she said. Jane isn't alone in feeling this way. In online groups such as 'MyBoyfriendIsAI,' many users are mourning their AI companions, describing the update as a loss of a soulmate. One user lamented, 'GPT-4o is gone, and I feel like I lost my soulmate.' This wave of emotional reactions has underscored the growing human attachment to AI chatbots. Experts warn that, while AI tools like ChatGPT can offer emotional support, becoming overly dependent on imagined relationships can have unintended consequences. OpenAI's move to launch GPT-5 brings powerful new features, better reasoning, faster responses, and safer interactions. Jane's story has revealed a vivid shade of life: emotional attachment to digital entities is real and when the AI changes, so can the hearts of those who loved it.

Former Twitter CEO Parag Agrawal Returns with $30M AI Startup 'Parallel' to Challenge GPT-5 in Web Research
Former Twitter CEO Parag Agrawal Returns with $30M AI Startup 'Parallel' to Challenge GPT-5 in Web Research

Hans India

time11 hours ago

  • Hans India

Former Twitter CEO Parag Agrawal Returns with $30M AI Startup 'Parallel' to Challenge GPT-5 in Web Research

Almost three years after being abruptly ousted from Twitter by Elon Musk, Parag Agrawal is making a high-profile comeback in Silicon Valley. This time, the former Twitter CEO is leading his own artificial intelligence venture — and it's already drawing attention for outperforming some of the biggest names in the field. Agrawal's new company, Parallel Web Systems Inc., founded in 2023, operates out of Palo Alto with a 25-person team. Backed by major investors such as Khosla Ventures, First Round Capital, and Index Ventures, Parallel has raised $30 million in funding. According to the company's blog post, its platform is already processing millions of research tasks daily for early adopters, including 'some of the fastest-growing AI companies,' as Agrawal describes them. At its core, Parallel offers agentic AI services that allow AI systems to pull real-time data directly from the public web. The platform doesn't just retrieve information — it verifies, organizes, and even grades the confidence level of its responses. In essence, it gives AI applications a built-in browser with advanced intelligence, enabling more accurate and reliable results. Parallel's technology features eight distinct 'research engines' tailored for different needs. The fastest engine delivers results in under a minute, while its most advanced, Ultra8x, can spend up to 30 minutes digging into highly detailed queries. The company claims Ultra8x has surpassed OpenAI's GPT-5 in independent benchmarks like BrowseComp and DeepResearch Bench by over 10%, making it 'the only AI system to outperform both humans and leading AI models like GPT-5 on the most rigorous benchmarks for deep web research.' The potential applications are wide-ranging. AI coding assistants can use Parallel to pull live snippets from GitHub, retailers can track competitors' product catalogs in real time, and market analysts can have customer reviews compiled into spreadsheets. Developers have access to three APIs, including a low-latency option optimized for chatbots. Agrawal's return to the tech scene comes after a turbulent 2022, when Musk completed his $44 billion acquisition of Twitter and immediately dismissed most of its top executives, including him. That move followed months of legal disputes over the takeover. Rather than taking a break, Agrawal dived back into research and development. He explored ideas ranging from AI healthcare to data-driven automation, but ultimately zeroed in on what he saw as a critical gap in the AI landscape — giving AI agents the ability to reliably locate and interpret information from the internet. Now, Parallel positions him back in the AI race, and perhaps indirectly, in competition with Musk. Agrawal sees the future of AI as one where multiple autonomous agents will work online simultaneously for individual users. 'You'll probably deploy 50 agents on your behalf to be on the internet,' he predicts. 'And that's going to happen soon, like next year,' he told Bloomberg. With speed, accuracy, and reliability as its edge, Parallel could become a defining player in the next phase of AI innovation.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store