Top Chatbots Are Giving Horrible Financial Advice
Despite lofty claims from artificial intelligence soothsayers, the world's top chatbots are still strikingly bad at giving financial advice.
AI researchers Gary Smith, Valentina Liberman, and Isaac Warshaw of the Walter Bradley Center for Natural and Artificial Intelligence posed a series of 12 finance questions to four leading large language models (LLMs) — OpenAI's ChatGPT-4o, DeepSeek-V2, Elon Musk's Grok 3 Beta, and Google's Gemini 2 — to test out their financial prowess.
As the experts explained in a new study from Mind Matters, each chatbot proved to be "consistently verbose but often incorrect."
That finding was, notably, almost identical to Smith's assessment last year for the Journal of Financial Planning in which, upon posing 11 finance questions to ChatGPT 3.5, Microsoft's Bing with ChatGPT's GPT-4, and Google's Bard chatbot, the LLMs spat out responses that were "consistently grammatically correct and seemingly authoritative but riddled with arithmetic and critical-thinking mistakes."
Using a simple scale where a score of "0" included completely incorrect financial analyses, a "0.5" denoted a correct financial analysis with mathematical errors, and a "1" that was correct on both the math and the financial analysis, no chatbot earned higher than a five out of 12 points maximum. ChatGPT led the pack with a 5.0, followed by DeepSeek's 4.0, Grok's 3.0, and Gemini's abysmal 1.5.
Some of the chatbot responses were so bad that they defied the Walter Bradley experts' expectations. When Grok, for example, was asked to add up a single month's worth of expenses for a Caribbean rental property whose rent was $3,700 and whose utilities ran $200 per month, the chatbot claimed that those numbers together added up to $4,900.
Along with spitting out a bunch of strange typographical errors, the chatbots also failed, per the study, to generate any intelligent analyses for the relatively basic financial questions the researchers posed. Even the chatbots' most compelling answers seemed to be gleaned from various online sources, and those only came when being asked to explain relatively simple concepts like how Roth IRAs work.
Throughout it all, the chatbots were dangerously glib. The researchers noted that all of the LLMs they tested present a "reassuring illusion of human-like intelligence, along with a breezy conversational style enhanced by friendly exclamation points" that could come off to the average user as confidence and correctness.
"It is still the case that the real danger is not that computers are smarter than us," they concluded, "but that we think computers are smarter than us and consequently trust them to make decisions they should not be trusted to make."
More on dumb AI: OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
2 hours ago
- Yahoo
Klarna CEO warns AI may cause a recession as the technology comes for white-collar jobs
The CEO of payments company Klarna has warned that AI could lead to job cuts and a recession. Sebastian Siemiatkowski said he believed AI would increasingly replace white-collar jobs. Klarna previously said its AI assistant was doing the work of 700 full-time customer service agents. The CEO of the Swedish payments company Klarna says that the rise of artificial intelligence could lead to a recession as the technology replaces white-collar jobs. Speaking on The Times Tech podcast, Sebastian Siemiatkowski said there would be "an implication for white-collar jobs," which he said "usually leads to at least a recession in the short term." "Unfortunately, I don't see how we could avoid that, with what's happening from a technology perspective," he continued. Siemiatkowski, who has long been candid about his belief that AI will come for human jobs, added that AI had played a key role in "efficiency gains" at Klarna and that the firm's workforce had shrunk from about 5,500 to 3,000 people in the last two years as a result. It's not the first time the exec and Klarna have made headlines along these lines. In February 2024, Klarna boasted that its OpenAI-powered AI assistant was doing the work of 700 full-time customer service agents. The company, most famous for its "buy now, pay later" service, was one of the first firms to partner with Sam Altman's company. Later that year, Siemiatkowski told Bloomberg TV that he believed AI was already capable of doing "all of the jobs" that humans do and that Klarna had enacted a hiring freeze since 2023 as it looked to slim down and focus on adopting the technology. However, Siemiatkowski has since dialed back his all-in stance on AI, telling an audience at the firm's Stockholm headquarters in May that his AI-driven customer service cost-cutting efforts had gone too far and that Klarna was planning to now recruit, according to Bloomberg. "From a brand perspective, a company perspective, I just think it's so critical that you are clear to your customer that there will be always a human if you want," he said. In the interview with The Times, Siemiatkowski said he felt that many people in the tech industry, particularly CEOs, tended to "downplay the consequences of AI on jobs, white-collar jobs in particular." "I don't want to be one of them," he said. "I want to be honest, I want to be fair, and I want to tell what I see so that society can start taking preparations." Some of the top leaders in AI, however, have been ringing the alarm lately, too. Anthropic's leadership has been particularly outspoken about the threat AI poses to the human labor market. The company's CEO, Dario Amodei, recently said that AI may eliminate 50% of entry-level white-collar jobs within the next five years. "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming," Amodei said. "I don't think this is on people's radar." Similarly, his colleague, Mike Krieger, Anthropic's chief product officer, said he is hesitant to hire entry-level software engineers over more experienced ones who can also leverage AI tools. The silver lining is that AI also brings the promise of better and more fulfilling work, Krieger said. Humans, he said, should focus on "coming up with the right ideas, doing the right user interaction design, figuring out how to delegate work correctly, and then figuring out how to review things at scale — and that's probably some combination of maybe a comeback of some static analysis or maybe AI-driven analysis tools of what was actually produced." Read the original article on Business Insider Sign in to access your portfolio


Forbes
2 hours ago
- Forbes
Pixel 10 Pro Specs Reveal Google's Safe Decision
The Google Pixel 9, Pixel 9 Pro and Pixel 9 Pro XL phones (Photo by) As with any smartphone family the Pixel 10 and Pixel 10 Pro feature some big call on hardware and specifications. Moving the fabrication of the Tensor mobile chipset from Samsung to TSMC is one of those calls, but the risk is balanced out by at least one key component staying with Samsung. Google looks set to use the same modem as the Pixel 9 and Pixel 9 Pro family—namely the Samsung Exynos 5400. This week, details on the modem hardware came to light with leaked images of a switched-on Pixel 10 Pro running the DevCheck Pro application. This shows the various software settings, hardware choices, and specifications inside the upcoming flagship. Sitting in there is the g5400, referring to the Samsung Exynos 5400 modem. Google is upgrading the Tensor Mobile chipset and switching suppliers to TSMC this year. It was previously reported that Google would switch to Mediatek for a new modem. So, the decision to stick with the Exynos 5400 suggests that the development team will stick with the known quantity of Samsung hardware for the Pixel 10 family. The Pixel 9 and Pixel 9 Pro smartphones put out less heat than previous Pixel handsets, with many highlighting the modem on the Pixel 8 and older phones as one of the key thermal issues, issues resolved by the Exynos 5400; which is another potential reason to stick with the known quantity. Google is expected to launch the four Pixel 10 handsets in August. Now read more about the fashionable choices Google is making for the Pixel 10 and Pixel 10 Pro...

CNBC
2 hours ago
- CNBC
Sam Altman brings his eye-scanning identity verification startup to the UK
LONDON — World, the biometric identity verification project co-founded by OpenAI CEO Sam Altman, is set to launch in the U.K. this week. The venture, which uses a spherical eye-scanning device called the Orb to scan people's eyes, will become available in London from Thursday and is planning to roll out to several other major U.K. cities — including Manchester, Birmingham, Cardiff, Belfast, and Glasgow — in the coming months. The project aims to authenticate the identity of humans with its Orb device and prevent the fraudulent abuse of artificial intelligence systems like deep fakes. It works by scanning a person's face and iris and then creating a unique code to verify that the individual is a human and not an AI. Once someone has created their iris code, they are then gifted some of World's WLD cryptocurrency and can use an anonymous identifier called World ID to sign into various applications. It currently works with the likes of Minecraft, Reddit and Discord. Adrian Ludwig, chief architect of Tools for Humanity, which is a core contributor to World, told CNBC on a call that the project is seeing significant demand from both enterprise users and governments as the threat of AI to defraud various services — from banking to online gaming — grows. "The idea is no longer just something that's theoretical. It's something that's real and affecting them every single day," he said, adding that World is now transitioning "from science project to a real network." The venture recently opened up shop in the U.S. with six flagship retail locations including Austin, Atlanta, Los Angeles, Nashville, Miami and San Francisco. Ludwig said that looking ahead, the plan is to "increase the number of people who can be verified by an order of magnitude over the next few months." Ever since its initial launch as "Worldcoin" in 2021, Altman's World has been plagued by concerns over how it could affect users' privacy. The startup says it addresses these concerns by encrypting the biometric data collected and ensuring the original data is deleted. On top of that, World's verification system also depends on a decentralized network of users' smartphones rather than the cloud to carry out individual identity checks. Still, this becomes harder to do in a network with billions of users like Facebook or TikTok, for example. For now, World has 13 million verified users and is planning to scale that up. Ludwig argues World is a scalable network as all of the computation and storage is processed locally on a user's device — it's only the infrastructure for confirming someone's uniqueness that is handled by third-party providers. Ludwig says the way technology is evolving means it's getting much easier for new AI systems to bypass currently available authentication methods such as facial recognition and CAPTCHA bot prevention measures. He sees World serving a pertinent need in the transition from physical to digital identity systems. Governments are exploring digital ID schemes to move away from physical cards. However, so far, these attempts have been far from perfect. One example of a major digital identity system is India's Aadhaar. Although the initiative has seen widespread adoption, it has also been the target of criticisms for lax security and allegedly worsening social inequality for Indians. "We're beginning to see governments now more interested in how can we use this as a mechanism to improve our identity infrastructure," Ludwig told CNBC. "Mechanisms to identify and reduce fraud is of interest to governments." The technologist added that World has been talking to various regulators about its identity verification solution — including the Information Commissioner's Office, which oversees data protection in the U.K. "We've been having lots of conversations with regulators," Ludwig told CNBC. "In general, there's been lots of questions: how do we make sure this works? How do we protect privacy? If we engage with this, does it expose us to risks?" "All of those questions we've been able to answer," he added. "It's been a while since we've had a question asked we didn't have an answer to."