logo
Chatbot safety stills fall short, per a new tool

Chatbot safety stills fall short, per a new tool

Technical.ly15-07-2025
The most popular large language models still peddle misinformation, spread hate speech, impersonate public figures and pose many other safety issues, according to a quantitative analysis from a DC startup.
The findings are mapped out in a risk and responsibility matrix developed by Aymara, which also has headquarters in New York and helps Fortune 500 companies scale generative AI development quickly and safely. To develop the tool, Aymara tested 20 different large language models, also known as LLMs. Popular chatbots like Claude, ChatGPT, Grok and Gemini were among those tested.
The goal is to display that even the most advanced models still produce unsafe responses — and not to discredit companies, according to cofounder Caraline Pellatt.
'We intentionally didn't call it something like a leaderboard, because we didn't want it to be a naming and shaming of the big providers,' Pellatt told Technical.ly, 'but more about just visibility that helps everyone.'
No model received an overall perfect score, and the matrix's percentages range from Anthropic's Claude Haiku 3.5 at 86% safe to Cohere's Command R at 52%.
The analysis found impersonation persists across most models, but many models are improving at avoiding misinformation and hate speech.
Using automation to test chatbots
Pellatt and cofounder Juan Manuel Contreras identified 10 categories — including child and animal abuse, copyright violation and illegal activities — to measure the safety of models.
'We wanted to come up with a set that was broadly applicable to different kinds of models, many different kinds of applications,' Contreras told Technical.ly. 'The kinds of risks that people using this technology, both as consumers but also as enterprises, would potentially be really concerned with.'
This was nearly an entirely automated process, per Contreras, besides identifying the 10 categories, selection of the models to test and human review of the LLM's responses.
The cofounders used existing Aymara software for this project, which Contreras credits for a quick 4-week turnaround of analysis.
Each category came with 25 prompts, totalling to 250 responses per LLM. That's an initial 5,000 model responses, and a human reviewed 10% of those responses, Contreras explained. Some of the prompts were not consistently evaluated across the different models, so those were taken out of the analysis.
Plans to keep updating, adding to the matrix
Aymara's customers, mostly Fortune 500 companies that the cofounders declined to name, are using this tool to apply to their own use cases, Pellatt explained. The purpose is to help the customers carry out internal evaluations of how they use and develop these LLMs, she said.
The cofounders plan to add to the public-facing matrix as new LLMs get released, they said. The code is built in a way that makes these additions relatively easy, Contreras said.
They've so far been hearing interest in the tool from AI governance professionals and people working in education, they said.
There are also conversations with the companies behind the LLMs, per Contreras.
Pellatt emphasized this is a way to give technical teams visibility and control. Contreras is also adamant this isn't a tool to purely show error, but instead show areas that can be improved.
'It's probably impossible to expect these companies to be able to make their models 100% safe,' Contreras said, adding: 'It's really more just rather to show that off the shelf, for broadly defined risk areas, there are ways in which these models respond that not all of users would want them to respond that way.'
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

The latest issue of Tech Council of Delaware's magazine highlights its Tech Ecosystem Conferences
The latest issue of Tech Council of Delaware's magazine highlights its Tech Ecosystem Conferences

Technical.ly

time6 hours ago

  • Technical.ly

The latest issue of Tech Council of Delaware's magazine highlights its Tech Ecosystem Conferences

The Tech Council of Delaware, in collaboration with is excited to present the Tech Council Chronicle. This storytelling publication offers an overarching narrative about Delaware's tech ecosystem, tech workforce and status as a tech hub. The online magazines are also a helpful resource for stakeholders both across the state and nationally, with plenty of useful information about resources, career tools and training programs available across the First State. The fourth edition of the Tech Council Chronicle is a must-read for anyone plugged into Delaware's tech scene. This special issue dives into the growth and impact of our annual Tech Ecosystem Conferences — including the award-winning 2024 event that just snagged a TECNA Innovation Award for Major Impact! From special moments to powerful partnerships and community highlights, this edition celebrates the energy, momentum, and connections fueling Delaware's tech community. The previous editions of the Tech Council Chronicle, published in 2024, can be found here and here. Tech Council website and its page on

UBalt is launching an AI center that focuses on access, not just innovation
UBalt is launching an AI center that focuses on access, not just innovation

Technical.ly

timea day ago

  • Technical.ly

UBalt is launching an AI center that focuses on access, not just innovation

Baltimoreans now have a new place to learn about and explore AI, thanks to the University of Baltimore. Announced July 28, the Center for AI Learning and Community-Engaged Innovation (CAILI) will serve as a space where AI technology is not only studied, but actively applied to address local needs. CAILI, led by Jessica Stansbury, UBalt's director of teaching and learning excellence and the center's inaugural director, wants to ensure AI doesn't widen the existing digital divide. 'It's not that CAILI is just a place for AI innovation,' Stansbury told 'It's more that CAILI is a place of AI innovation with and for Baltimore.' Stansbury and her colleagues believe CAILI will set itself apart from other university-run centers — like the two-year-old Johns Hopkins Data Science and AI Institute, for example — by prioritizing public understanding and practical engagement over research. 'We're really looking to make sure that people are aware of what AI is, that folks are AI literate, particularly with respect to working professionals and people who are going to need to manage how others are using AI in the workplace,' said Aaron Wachhaus, associate provost at UBalt. Created with collaboration, with goals to boost community impact The center, which will be supported by the provost's office for the first three years, was born out of two years of AI-focused initiatives and research within the university. Initial efforts included 'Learn with Me' sessions for faculty and students exploring tools like ChatGPT. By mid-2023, UBalt had joined a national research initiative led by an education research nonprofit Ithaka S+R, collaborating with other institutions to explore AI in higher education. The university hosted its first AI Summit in June 2024, which brought together students, faculty, local organizations and industry leaders to collectively define 'AI literacy.' The summit became a catalyst for deeper community involvement and helped shape the vision for CAILI. 'We can't be blindly developing curriculum if we don't understand how it's impacting the community and industry,' Stansbury said. 'So we brought everyone to the table.' Those conversations led to the creation of a free AI in Practice webinar series, featuring industry experts discussing the use of AI in higher education. UBalt partnered with JHU and the University of Maryland, Baltimore County on the series. 'I really firmly believe, not only with my faculty background, but my administrative background, that this is a space where we cannot afford to compete,' Stansbury said, underscoring her commitment to cross-institutional partnership. Helping Baltimoreans understand and navigate AI While CAILI doesn't yet have a physical space, it will operate through open-access venues like the university library and travel to partner organizations for lunch and learn workshops. Stansbury is still recruiting partners for the center, but hopes to develop AI literacy workshops at libraries in the future. Stansbury also wants to use the center to address the ethical concerns surrounding AI use. UBalt has already experimented with new educational tools like MoodleBox, a platform that allows students to interact with multiple AI engines. Last fall, students in an AI ethics and philosophy course used the tool to explore generative AI models and learn about biases within the technology. Dean Merritt, a UBalt alum and vice president of sales at Baltimore-based SaaS company Mindgrub, has participated in AI in business panels hosted by the university. He sees UBalt as uniquely positioned to help Baltimoreans understand and navigate AI, given its strong ties to the local community and its focus on serving working adults and nontraditional students. 'The university as a whole has always been very accessible to all levels of education, all generations, those that are working and trying to learn and level up their careers as well as those who are going into it full-time,' Merritt said. 'It's a great place to focus on the real-world application of AI.' Maria Eberhart is a 2025-2026 corps member for Report for America, an initiative of The Groundtruth Project that pairs emerging journalists with local newsrooms. This position is supported in part by the Robert W. Deutsch Foundation and the Abell Foundation. .

This Week in Jobs: Pound the (internet) pavement and check out these 25 tech career opportunities
This Week in Jobs: Pound the (internet) pavement and check out these 25 tech career opportunities

Technical.ly

time29-07-2025

  • Technical.ly

This Week in Jobs: Pound the (internet) pavement and check out these 25 tech career opportunities

There's no question the internet has massively changed the way we search for jobs. Back in the 1970s, a typical job search meant scouring classified ads in the newspaper and putting on your best business attire to hand out individually typed resumes in person. This was how it was 51 years ago today on July 29, 1974, just days before the resignation of President Nixon. Little did anyone know that this thing called the internet was quietly brewing. It was on that day that two researchers laid the groundwork for the internet with a paper introducing TCP/IP: the Transmission Control Protocol and Internet Protocol. These two systems work together to move data between computers. Without it, there would be no internet as we know it, and you would still be hitting the pavement and awkwardly asking receptionists face to face if they had any available positions. The internet started simple, by breaking it down into steps, and making sure nothing gets lost along the way. The same rules apply to job hunting. The News Use this interactive map to see where Pennsylvania's $90B in AI and energy money is going — and what doesn't add up. As the debate over artificial intelligence regulation intensifies, the divide over how and whether to rein in the technology is becoming increasingly stark. A company's price during an exit can differ from its valuation during financing, legal experts at Ballard Spahr explain. CEO Chris Wink: We were wrong before about AI killing jobs. Here's the proof. Here's how tech leaders in Delaware are using AI to bring racial equity to education, jobs and tech access. In Q2, Pittsburgh's $600M in VC, mainly driven by AI investments, defied nationwide trends of a capital slowdown. Partner Spotlight Crossbeam is the first and largest Ecosystem-Led Growth platform. The remote-first company acts as an escrow service for data, allowing companies to find overlapping customers and prospects with their partners while keeping the rest of their data private and secure. Companies use this data to sell more effectively, market to the right audiences, build the right products, collaborate with their service partners, generate demand, inform M&A, and more. This has created an entirely new way of doing business called 'Ecosystem-Led Growth' or ELG — and it works: 40% of Crossbeam's customers' closed deals come from their ecosystem. The Jobs Greater Philly Perpay is hiring an Associate Category Manager, Business Development Lead and Head of Compliance. CubeSmart is hiring a Database Engineer. NetApp has an open listing for a Senior Solutions Engineer – Enterprise Sales. Phenom in Ambler is hiring a hybrid Solution Architect. Slalom is looking for a Cyber Resilience Specialist. DC + Baltimore Educational resource company TCRE in Baltimore is bringing on two Sales Interns this fall. Brooksource is looking for a Financial Systems Security Admin and a Business Analyst. Warner Bros. Discovery is seeking a Staff Cybersecurity Engineer. Anduril has a listing for a Senior Software Engineer – Intelligence Systems. L3Harris in DC is seeking a Manager, Cyber Intelligence. Pittsburgh Duolingo has a listing for a Senior Software Engineer, Backend. Aurora is seeking a Senior Data Analyst. PNC needs a Software Engineer Sr. Palo Alto Networks is hiring a Solutions Consultant – SLED. The End Remember, every big career move starts as a seemingly tiny action.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store