Latest news with #Grok3Beta

Top Chatbots Are Giving Horrible Financial Advice

Yahoo

27-04-2025

Business
Yahoo

Top Chatbots Are Giving Horrible Financial Advice

Despite lofty claims from artificial intelligence soothsayers, the world's top chatbots are still strikingly bad at giving financial advice. AI researchers Gary Smith, Valentina Liberman, and Isaac Warshaw of the Walter Bradley Center for Natural and Artificial Intelligence posed a series of 12 finance questions to four leading large language models (LLMs) — OpenAI's ChatGPT-4o, DeepSeek-V2, Elon Musk's Grok 3 Beta, and Google's Gemini 2 — to test out their financial prowess. As the experts explained in a new study from Mind Matters, each chatbot proved to be "consistently verbose but often incorrect." That finding was, notably, almost identical to Smith's assessment last year for the Journal of Financial Planning in which, upon posing 11 finance questions to ChatGPT 3.5, Microsoft's Bing with ChatGPT's GPT-4, and Google's Bard chatbot, the LLMs spat out responses that were "consistently grammatically correct and seemingly authoritative but riddled with arithmetic and critical-thinking mistakes." Using a simple scale where a score of "0" included completely incorrect financial analyses, a "0.5" denoted a correct financial analysis with mathematical errors, and a "1" that was correct on both the math and the financial analysis, no chatbot earned higher than a five out of 12 points maximum. ChatGPT led the pack with a 5.0, followed by DeepSeek's 4.0, Grok's 3.0, and Gemini's abysmal 1.5. Some of the chatbot responses were so bad that they defied the Walter Bradley experts' expectations. When Grok, for example, was asked to add up a single month's worth of expenses for a Caribbean rental property whose rent was $3,700 and whose utilities ran $200 per month, the chatbot claimed that those numbers together added up to $4,900. Along with spitting out a bunch of strange typographical errors, the chatbots also failed, per the study, to generate any intelligent analyses for the relatively basic financial questions the researchers posed. Even the chatbots' most compelling answers seemed to be gleaned from various online sources, and those only came when being asked to explain relatively simple concepts like how Roth IRAs work. Throughout it all, the chatbots were dangerously glib. The researchers noted that all of the LLMs they tested present a "reassuring illusion of human-like intelligence, along with a breezy conversational style enhanced by friendly exclamation points" that could come off to the average user as confidence and correctness. "It is still the case that the real danger is not that computers are smarter than us," they concluded, "but that we think computers are smarter than us and consequently trust them to make decisions they should not be trusted to make." More on dumb AI: OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems

Google Launches Gemini 2.5 With Focus on Complex Reasoning and AI Agent Capabilities

Yahoo

25-03-2025

Business
Yahoo

Google Launches Gemini 2.5 With Focus on Complex Reasoning and AI Agent Capabilities

Google (NASDAQ:GOOG) introduced Gemini 2.5 on Tuesday, its latest large language model designed to bring advanced reasoning capabilities to artificial intelligence applications. The company described Gemini 2.5 as a thinking model that improves response accuracy by processing information more deeply before answering. According to a company blog post, the model analyzes data, applies context, draws logical conclusions, and makes decisionskey components of what it defines as reasoning in AI. Gemini 2.5 is built on an upgraded base model combined with refined post-training, which Google said allows for better performance and supports the development of more capable and context-aware AI agents. The launch includes Gemini 2.5 Pro Experimental, described by Google as its most advanced model for complex, multimodal tasks. The company said it outperforms comparable models, including OpenAI's o3-mini and GPT-4.5, Claude's Sonnet 3.7, Grok 3 Beta, and DeepSeek's R1. Gemini 2.5 Pro Experimental is currently accessible through Google's AI Studio, the Gemini app for Advanced plan subscribers, and is expected to arrive on Vertex AI soon. Pricing details will be provided in the coming weeks. This article first appeared on GuruFocus.

Latest news with #Grok3Beta

Top Chatbots Are Giving Horrible Financial Advice

Google Launches Gemini 2.5 With Focus on Complex Reasoning and AI Agent Capabilities

Get Started Now: Download the App