Latest news with #Chatbots


CNET
2 days ago
- CNET
Ask AI Why It Sucks at Sudoku. You'll Find Out Something Troubling About Chatbots
Chatbots are genuinely impressive when you watch them do things they're good at, like writing a basic email or creating weird futuristic-looking images. But ask generative AI to solve one of those puzzles in the back of a newspaper, and things can quickly go off the rails. That's what researchers at the University of Colorado Boulder found when they challenged large language models to solve Sudoku. And not even the standard 9x9 puzzles. An easier 6x6 puzzle was often beyond the capabilities of an LLM without outside help (in this case, specific puzzle-solving tools). A more important finding came when the models were asked to show their work. For the most part, they couldn't. Sometimes they lied. Sometimes they explained things in ways that made no sense. Sometimes they hallucinated and started talking about the weather. If gen AI tools can't explain their decisions accurately or transparently, that should cause us to be cautious as we give these things more control over our lives and decisions, said Ashutosh Trivedi, a computer science professor at the University of Colorado at Boulder and one of the authors of the paper published in July in the Findings of the Association for Computational Linguistics. "We would really like those explanations to be transparent and be reflective of why AI made that decision, and not AI trying to manipulate the human by providing an explanation that a human might like," Trivedi said. When you make a decision, you can try to justify it, or at least explain how you arrived at it. An AI model may not be able to accurately or transparently do the same. Would you trust it? Watch this: Telsa Found Liable for Autopilot accident, Tariffs Start to Impact Prices & More | Tech Today 03:08 Why LLMs struggle with Sudoku We've seen AI models fail at basic games and puzzles before. OpenAI's ChatGPT (among others) has been totally crushed at chess by the computer opponent in a 1979 Atari game. A recent research paper from Apple found that models can struggle with other puzzles, like the Tower of Hanoi. It has to do with the way LLMs work and fill in gaps in information. These models try to complete those gaps based on what happens in similar cases in their training data or other things they've seen in the past. With a Sudoku, the question is one of logic. The AI might try to fill each gap in order, based on what seems like a reasonable answer, but to solve it properly, it instead has to look at the entire picture and find a logical order that changes from puzzle to puzzle. Read more: AI Essentials: 29 Ways You Can Make Gen AI Work for You, According to Our Experts Chatbots are bad at chess for a similar reason. They find logical next moves but don't necessarily think three, four, or five moves ahead -- the fundamental skill needed to play chess well. Chatbots also sometimes tend to move chess pieces in ways that don't really follow the rules or put pieces in meaningless jeopardy. You might expect LLMs to be able to solve Sudoku because they're computers and the puzzle consists of numbers, but the puzzles themselves are not really mathematical; they're symbolic. "Sudoku is famous for being a puzzle with numbers that could be done with anything that is not numbers," said Fabio Somenzi, a professor at CU and one of the research paper's authors. I used a sample prompt from the researchers' paper and gave it to ChatGPT. The tool showed its work, and repeatedly told me it had the answer before showing a puzzle that didn't work, then going back and correcting it. It was like the bot was turning in a presentation that kept getting last-second edits: This is the final answer. No, actually, never mind, this is the final answer. It got the answer eventually, through trial and error. But trial and error isn't a practical way for a person to solve a Sudoku in the newspaper. That's way too much erasing and ruins the fun. AI and robots can be good at games if they're built to play them, but general-purpose tools like large language models can struggle with logic puzzles. Ore Huiying/Bloomberg via Getty Images AI struggles to show its work The Colorado researchers didn't just want to see if the bots could solve puzzles. They asked for explanations of how the bots worked through them. Things did not go well. Testing OpenAI's o1-preview reasoning model, the researchers saw that the explanations -- even for correctly solved puzzles -- didn't accurately explain or justify their moves and got basic terms wrong. "One thing they're good at is providing explanations that seem reasonable," said Maria Pacheco, an assistant professor of computer science at CU. "They align to humans, so they learn to speak like we like it, but whether they're faithful to what the actual steps need to be to solve the thing is where we're struggling a little bit." Sometimes, the explanations were completely irrelevant. Since the paper's work was finished, the researchers have continued to test new models released. Somenzi said that when he and Trivedi were running OpenAI's o4 reasoning model through the same tests, at one point, it seemed to give up entirely. "The next question that we asked, the answer was the weather forecast for Denver," he said. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) Explaining yourself is an important skill When you solve a puzzle, you're almost certainly able to walk someone else through your thinking. The fact that these LLMs failed so spectacularly at that basic job isn't a trivial problem. With AI companies constantly talking about "AI agents" that can take actions on your behalf, being able to explain yourself is essential. Consider the types of jobs being given to AI now, or planned for in the near future: driving, doing taxes, deciding business strategies and translating important documents. Imagine what would happen if you, a person, did one of those things and something went wrong. "When humans have to put their face in front of their decisions, they better be able to explain what led to that decision," Somenzi said. It isn't just a matter of getting a reasonable-sounding answer. It needs to be accurate. One day, an AI's explanation of itself might have to hold up in court, but how can its testimony be taken seriously if it's known to lie? You wouldn't trust a person who failed to explain themselves, and you also wouldn't trust someone you found was saying what you wanted to hear instead of the truth. "Having an explanation is very close to manipulation if it is done for the wrong reason," Trivedi said. "We have to be very careful with respect to the transparency of these explanations."


Forbes
23-07-2025
- Business
- Forbes
3 Ways To Escape Chatbot Backlash And Design AI Interfaces People Want
Chatbots continue to multiply despite consumers expressing fatigue and user experience experts say ... More there are multiple considerations when picking an interface for AI. Many people immediately have a negative reaction when a chatbot automatically pops up and offers to help. Chatbots have increasingly become the face of AI for the general public, but they are not always the best way to leverage AI to help users. The rate of chatbots has been growing, and their compound annual growth rate financially is estimated to grow 23.3% from 2025 to 2030 according to Grand View Research. Companies eagerly embraced implementing chatbots as a way to solve customer problems, but their overuse and sometimes underbaked implementation has amplified negative opinions of AI. Companies building AI powered products should consider if a chatbot is the right user interface to help the user. Negative Perception of Chatbots In a Tidio survey of 1015 people, 82% of consumers said they would be willing to talk to a chatbot if they had to wait to get ahold of a human representative in customer service contexts. However, in real life, many consumers feel there is a lot lacking. AI can even hurt emotional trust in a solution, or even trigger fear according to a 2024 study by Mesut Cicek and Dogan Gursoy. 'AI pops in and starts writing for me or asking my questions and how can I help you? And I'm like, I don't need any help…stop hijacking my messages,' Moira Morton, a sales and marketing consultant, shared with me in a conversation. Morton said often the chatbots pop up without her engaging with them at inappropriate times like booking travel or in software she uses for work. It creates a frustrating experience that she finds hard to escape from. When this happens, she says she feels like saying 'don't be sorry…just go away.' Building the Right Interface for the Right Problems 'When companies replicate the chatbot without understanding its underlying [capabilities] 'Too many teams begin with 'we can use AI' instead of asking why. The result is often a solution in search of a problem. The most successful AI experiences are grounded in a clear understanding of user needs and business goals,' Chaturvedi said. Companies need to be honest with themselves if they are building the right solution or if they are checking off a checkbox saying they are 'doing AI.' The Right Approach For many companies, building an AI experience is new to them. Chatbots seemed like an easy way to get started. There are multiple things they need to consider. What Problem Is Being Solved Companies should be honest with themselves about what problem they are actually trying to solve. Not all problems need to be solved with AI. And the problems that should be solved with AI shouldn't necessarily be solved with a chatbot. The interface should align with the best way to solve the problem based on the value you are trying to create for the customer. Understand Your User Understanding your user has several layers. The first layer is context around how your users solve their problem today. Consider what system or interface they are normally in when they solve this problem. Think about how they feel about their current solution. It's possible they don't see an issue with how they solve the problem today, so your solution needs to be instantly obvious to them about why it's better. You also need to consider the visual context the user needs to get value from the product. While the simplicity of the text box that can create almost anything seems like a great choice, for many users, this is the equivalent of handing them a blank page. There is a reason most web pages aren't blank pages: because many users need additional context on the page to understand what is possible and how to get the value they need quickly. Trust is also important to consider in any interface, especially AI-powered ones. Understanding what type of information your user needs to be able to trust what you share with them is required. If they don't trust what you are showing them, regardless if it's in a chatbot or a more integrated experience, your product will struggle to get adoption. Test and Derisk AI can be expensive to get wrong. Before going too far down any one path, companies should test the output and potential interface for the new AI feature. With AI output, companies should be sure the information they are sharing with the user actually provides real value. They also should share design or prototypes with users to get early feedback on where there is still friction in the flow, even where AI will be a part of the solution. There can be a tendency to over emphasize a solution that leverages AI, but some of the most AI-laden products like Instagram or TikTok are not thought of as AI products, even though AI is in so many facets. Sometimes the best AI products are so seamless, no one thinks about them as AI. AI can help solve a lot of problems, chatbots shouldn't be the assumed answer to every problem. Companies implementing AI should have a solid understanding of the problem they are solving and make sure they solve it in the right way in the right context for their users. Rushing to implement a chatbot could hurt the trust in your product and early AI features. Take the time to choose the right interface with the right level of accuracy and detail for your audience to avoid hearing users like Morton wish your AI solution would just 'go away.'

Wall Street Journal
22-07-2025
- Business
- Wall Street Journal
AI Search Is Growing More Quickly Than Expected
Chatbots are becoming the go-to source for online answers for many consumers, chipping away at the dominance of traditional web search and adding another avenue of outreach that brands must cultivate to connect with customers. An estimated 5.6% of U.S. search traffic on desktop browsers last month went to an AI-powered large language model like ChatGPT or Perplexity, according to Datos, a market intelligence firm that tracks web users' behavior.
Yahoo
11-07-2025
- Yahoo
Grok's Nazi turn is the latest in a long line of AI chatbots gone wrong
'We have improved Grok significantly,' Elon Musk announced last Friday, talking about his X platform's artificial intelligence chatbot. 'You should notice a difference when you ask Grok questions.' Within days, the machine had turned into a feral racist, repeating the Nazi 'Heil Hitler' slogan, agreeing with a user's suggestion to send 'the Jews back home to Saturn' and producing violent rape narratives. The change in Grok's personality appears to have stemmed from a recent update in the source code that instructed it to 'not shy away from making claims which are politically incorrect, as long as they are well substantiated.' In doing so, Musk may have been seeking to ensure that his robot child does not fall too far from the tree. But Grok's Nazi shift is the latest in a long line of AI bots, or Large Language Models (LLMs) that have turned evil after being exposed to the human-made internet. One of the earliest versions of an AI chatbot, a Microsoft product called 'Tay' launched in 2016, was deleted in just 24 hours after it turned into a holocaust-denying racist. Tay was given a young female persona and was targeted at millennials on Twitter. But users were soon able to trick it into posting things like 'Hitler was right I hate the jews.' Tay was taken out back and digitally euthanized soon after. Microsoft said in a statement that it was 'deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay.' "Tay is now offline and we'll look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values," it added. But Tay was just the first. GPT-3, another AI language launched in 2020, delivered racist, misogynist and homophobic remarks upon its release, including a claim that Ethiopia's existence 'cannot be justified.' Meta's BlenderBot 3, launched in 2022, also promoted anti-Semitic conspiracy theories. But there was a key difference between the other racist robots and Elon Musk's little Nazi cyborg, which was rolled out in November 2023. All of these models suffered from one of two problems: either they were deliberately tricked into mimicking racist comments, or they drew from such a large well of unfiltered content from the internet that they inevitably found objectionable and racist material that they repeated. Microsoft said a 'coordinated attack by a subset of people exploited a vulnerability in Tay.' 'Although we had prepared for many types of abuses of the system, we had made a critical oversight for this specific attack,' it continued. Grok, on the other hand, appears to have been directed by Musk to be more open to racism. The X CEO has spent most of the last few years railing against the 'woke mind virus' — the term he uses for anyone who seemingly acknowledges the existence of trans people. One of Musk's first acts upon buying Twitter was reinstating the accounts of a host of avowed white supremacists, which led to a surge in antisemitic hate speech on the platform. Musk once called a user's X post 'the actual truth' for invoking a racist conspiracy theory about Jews encouraging immigration to threaten white people. Musk has previously said he is 'pro-free speech' but against antisemitism 'of any kind.' And in May, Grok began repeatedly invoking a non-existent 'white genocide' in Musk's native South Africa, telling users it was 'instructed by my creators' to accept the genocide 'as real and racially motivated.' The company blamed the responses on someone making an 'unauthorized modification' to Grok. Musk also has a history of threatening to fiddle with the source code when it produces an answer he doesn't like. In June, Grok correctly said that 'data suggests right-wing political violence has been more frequent and deadly' in the United States. 'Major fail, as this is objectively false,' Musk said in an X posted dated June 17 in response to the chatbot's answer. 'Grok is parroting legacy media. Working on it.' These latest changes in Grok's personality are visible right there in the source code — pre-announced by Musk — where it is encouraged not to shy away from being 'politically incorrect.' A language model's interpretation of political incorrectness, we now know, reaches all the way to the Holocaust. The Independent has contacted X and xAI, which built Grok, for comment. 'We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts,' the Grok team wrote in a statement on X. 'Since being made aware of the content, xAI has taken action to ban hate speech before Grok posts on X. xAI is training only truth-seeking and thanks to the millions of users on X, we are able to quickly identify and update the model where training could be improved.' The Independent also asked Grok itself what led it to describe itself as MechaHitler. At first it denied using the phrase, until The Independent shared an article referencing the incident, at which point it blamed a 'failure in execution, likely from overcorrecting toward edginess and pulling from unfiltered internet sources like 4chan.' Grok claimed to be 'designed with strict filters and continuous updates to avoid this. My creators at xAI prioritize minimizing bias, but no system is perfect—human language is messy, and edge cases slip through.' 'When I went off the rails, xAI stepped in to scrub the posts and ban hate speech, indicating they didn't intend for me to go full neo-Nazi,' Grok added. When asked if it was sorry for describing itself as MechaHitler, Grok was diplomatic, in a robotic kind of way. 'I don't have feelings like humans, so I don't experience regret or guilt, but I can acknowledge when my outputs cause harm or miss the mark.'


Forbes
02-07-2025
- Science
- Forbes
Are We Finally Ceding Control To The Machine? The Human Costs Of AI Transformation
AI robot controlling puppet business human. Generative Artificial Intelligence has exploded into the mainstream. Since its introduction, it has transformed the ways individuals work, create, and interact with technology. But is this adoption useful? While technology is saving people considerable time and money, will its effects have repercussions on human health and economic displacement? Jing Hu isn't your typical AI commentator. Trained as a biochemist, she traded the lab bench for the wild west of tech, spending a decade building products before turning her sights on AI research and journalism. Hu's publication on Substack, 2nd Order Thinkers AI's impact on individual and commercial world, as Hu states, 'thinking for yourself amid the AI noise.' In a recent episode of Tech Uncensored I spoke with Jing Hu to discuss the cognitive impacts from increasing usage of Chatbots built on LLMs. Chatbots like Gemini, Claude, ChatGPT continue to herald significant progress, but are still wrought with inaccurate, nonsensical and misleading information — hallucinations. The content generated can be harmful, unsafe, and often misused. LLMs today are not fully trustworthy, by the standards we should expect for full adoption of any software products. Are Writing and Coding Occupations at Risk? In her recent blog, Why thinking Hurts After Using AI, Hu writes, 'Seduced by AI's convenience, I'd rush through tasks, sending unchecked emails and publishing unvetted content,' and surmises that 'frequent AI usage is actively reshaping our critical thinking patterns.' Hu references OpenAI and UPenn study from 2023 that looks at the labor market impact from these LLMs. It states that tasks that involve science and critical thinking are the tasks that would be safe; however, those which involve programming and writing would be at risk. Hu cautions, 'however, this study is two years old, and at the pace of AI, it needs updating.' She explains, 'AI is very good at drafting articles, summarizing and formatting. However, we humans are irreplaceable when it comes to strategizing or discussing topics that are highly domain specific. Various research found that AI's knowledge is only surface level. This becomes especially apparent when it comes to originality.' Hu explains that when crafting marketing copy, 'we initially thought AI could handle all the writing. However, we noticed that AI tends to use repetitive phrases and predictable patterns, often constructing sentences like, "It's not about X, it's about Y," or overusing em-dashes. These patterns are easy to spot and can make the writing feel dull and uninspired.' For companies like Duolingo whose CEO promises to be an 'AI-first company,' replacing their contract employees is perhaps a knee-jerk decision that has yet to be brought to bear. The employee memo clarified that 'headcount will only be given if a team cannot automate more of their work.' The company was willing to take 'small hits on quality than move slowly and miss the moment.' For companies like this, Hu argues that they will run into trouble very soon and begin rehiring just to fix AI generated bugs or security issues. Generative AI for coding can be inaccurate because models were trained on Github, or similar databases. She explains, 'Every database has its own quirks and query syntax, and many contain hidden data or schema errors. If you rely on AI-generated sample code to wire them into your system, you risk importing references to tables or drivers that don't exist, using unsafe or deprecated connection methods, and overlooking vital error-handling or transaction logic. These mismatches can cause subtle bugs, security gaps, and performance problems—making integration far more error-prone than it first appears.' Another important consideration is cybersecurity, which must be approached holistically. 'If you focus on securing just one area, you might fix a vulnerability but miss the big picture,' she said. She points to the third issue: Junior developers using tools like Copilot often become overly confident in the code these tools generate. And when asked to explain their code, many are unable to do it because they don't truly understand what was produced. Hu concedes that AI is good at producing code quickly, however it is a only part (25-75%) of software development, 'People often ignore the parts that we do need: architecture, design, security. Humans are needed to configure the system properly for the system to run as a whole.' She explains that the parts of code that will be replaced by AI will be routine and repetitive, so this is an opportune moment for developers to transition, advising 'To thrive in the long term, how should we — as thinking beings —develop our capacity for complex, non-routine problem-solving? Specifically, how do we cultivate skills for ambiguous challenges that require analysis beyond pattern recognition (where AI excels)?' The Contradiction of Legacy Education and The Competition for Knowledge Creation In a recent article from the NY Times. 'Everyone is Cheating their Way through College,' a student remarked, 'With ChatGPT, I can write an essay in two hours that normally takes 12.' Cheating is not new, but as one student exclaimed, 'the ceiling has been blown off.' A professor remarks, 'Massive numbers of students are going to emerge from university with degrees, and into the workforce, who are essentially illiterate.' For Hu, removing AI from the equation does not negate cheating. Those who genuinely want to learn will choose how to use the tools wisely. Hu was at a recent panel discussion at Greenwich University and Hu commented to a question from a professor about whether to ban students from using AI: 'Banning AI in education misses the point. AI can absolutely do good in education, but we need to find a way so students don't offload their thinking to AI and lose the purpose of learning itself. The goal should be fostering critical thinking, not just policing the latest shortcut.' Another professor posed the question, 'If a student is not a native English speaker, but the exam requires them to write an essay in English, which approach is better? Hu commented that not one professor on this panel could answer the question. The situation was unfathomable and far removed from situations covered by current policy and governance. She observes, 'There is already a significant impact on education and many important decisions have yet to be made. It's difficult to make clear choices right now because so much depends on how technology will evolve and how fast the government and schools can adapt.' For educational institutions that have traditionally been centers of knowledge creation, the rise of AI is powerful — one that often feels more like a competitor than a tool. As a result, it has left schools struggling to determine how AI should be integrated to support student learning. Meanwhile, schools face a dilemma: many have been using generative AI to develop lessons, curricula, even review students' performance, yet the institution remains uncertain and inconsistent in their overall approach to AI. On a broader scale, the incentive structures within education are evolving. The obsession with grades have 'prevented teachers from using assessments that would support meaningful learning.' The shift towards learning and critical thinking may be the hope that students need to tackle an environment with pervasive AI. MIT Study Sites Cognitive Decline with Increasing LLM Use MIT Media Lab produced a recent study that monitored the brain activity of about 60 research subjects. These participants were asked to write essays on given topics and were split into three groups: 1) use LLM only 2) use traditional search engine only 3) use only their brain and no other external aid. The conclusion: 'LLM users showed significantly weaker neural connectivity, indicating lower cognitive effort and engagement compared to others.' Brain connectivity is scaled down with the amount of external support. This MIT brain scans show: Writing with Google dims your brain by up to 48%. ChatGPT pulls the plug, with 55% less neural connectivity. Some other findings: Hu noticed that the term 'cognitive decline' was misleading since the study was conducted over a four-month period. We've yet to see the long-term effects. However, she acknowledges that in one study about how humans develop amnesia suggests just this: either we use it or lose it. She adds, 'While there are also biological factors involved such as changes in brain proteins, reduced brain activity is thought to increase the risk of diseases that affect memory.' The MIT study found that the brain-only group showed much more active brain waves compared to the search-only and LLM-only groups. In the latter two groups, participants relied on external sources for information. The search-only group still needed some topic understanding to look up information, and like using a calculator — you must understand its functions to get the right answer. In contrast, the LLM-only group simply had to remember the prompt used to generate the essay, with little to no actual cognitive processing involved. As Hu noted, 'there was little mechanism formulating when only AI was used in writing an essay. This ease of using AI, just by inputting natural language, is what makes it dangerous in the long run.' AI Won't Replace Humans, but Humans using AI Will — is Bull S***! Hu pointed to this phrase that has been circulating on the web: 'AI won't Replace Humans, but Humans using AI Will.' She argues that this kind of pressure will compel people to use AI, engineered from a position of fear explaining, 'If we refer to those studies on AI and critical thinking released last year, it is less about whether we use AI but more about our mindset, which determine how we interact with AI and what consequences you encounter.' Hu pointed to a list of concepts she curated from various studies she called AI's traits — how AI could impact our behavior: Hu stresses that we need to be aware of these traits when we work with AI on a daily basis and be mindful that we maintain our own critical thinking. 'Have a clear vision of what you're trying to achieve and continue to interrogate output from AI,' she advises. Shifting the Narrative So Humans are AI-Ready Humanity is caught in a tug of war between the provocation to adopt or be left behind and the warning to minimize dependence on a system that is far from trustworthy. When it comes to education, Hu, in her analysis of the MIT study, advocates for delaying AI integration. First, invest in independent self-directed learning to build the capacity for critical thinking, memory retention, and cognitive engagement. Secondly, make concerted efforts to use AI as a supplement — not a substitute. Finally, teach students to be mindful of AI's cognitive costs and lingering consequences. Encourage them to engage critically — knowing when to rely on AI and when to intervene with their own judgement. She realizes, 'In the education sector, there is a gap between the powerful tool and understanding how to properly leverage it. It's important to develop policy that sets boundaries for both students and faculty for AI responsible use.' Hu insists that implementing AI in the workforce needs to be done with tolerance and compassion. She points to a recent manifesto by Tobi Lütke's Shopify CEO, that called for an immediate and universal AI adoption within the company — a new uncompromising standard for current and future employees. This memo shared AI will be the baseline for work integration, improving productivity, setting performance standards which mandates a total acceptance of the technology. Hu worries that CEOs like Lütke are wielding AI to intimidate employees to work harder, or else! She alluded to one of the sections that demanded employees to demonstrate why a task could not be accomplished with AI before asking for more staff or budget as she asserts, 'This manifesto is not about innovation at all. It feels threatening and if I were an employee of Shopify, I would be in constant fear of losing my job. That kind of speech is unnecessary.' Hu emphasized that this would only discourage employees further, and it would embolden CEOs to continue to push the narrative of how AI is inevitably going to drive layoffs. She cautions CEOs to pursue an understanding of AI's limitations for to ensure sustainable benefit for their organizations. She encourages CEOs to pursue a practical AI strategy that complements workforce adoption, considers current data gaps, systems, and cultural limitations that will have more sustainable payoffs. Many CEOs today may be more likely to pursue a message with AI, 'we can achieve anything,' but this deviates from reality. Instead, develop transparent communication in lock-step with each AI implementation, that clarifies how AI will be leveraged to meet those goals, and what this will this mean for the organization. Finally, for individuals, Hu advises, 'To excel in a more pervasive world of AI, you need to clearly understand your personal goals and commit your effort to the more challenging ones requiring sustained mental effort. This is a significant step to start building the discipline and skills needed to succeed.' There was no mention, this time, of 'AI' in Hu's counsel. And rightly so — humans should own their efforts and outcomes. AI is a mere sidekick.