Latest news with #Gemini2
Yahoo
27-04-2025
- Business
- Yahoo
Top Chatbots Are Giving Horrible Financial Advice
Despite lofty claims from artificial intelligence soothsayers, the world's top chatbots are still strikingly bad at giving financial advice. AI researchers Gary Smith, Valentina Liberman, and Isaac Warshaw of the Walter Bradley Center for Natural and Artificial Intelligence posed a series of 12 finance questions to four leading large language models (LLMs) — OpenAI's ChatGPT-4o, DeepSeek-V2, Elon Musk's Grok 3 Beta, and Google's Gemini 2 — to test out their financial prowess. As the experts explained in a new study from Mind Matters, each chatbot proved to be "consistently verbose but often incorrect." That finding was, notably, almost identical to Smith's assessment last year for the Journal of Financial Planning in which, upon posing 11 finance questions to ChatGPT 3.5, Microsoft's Bing with ChatGPT's GPT-4, and Google's Bard chatbot, the LLMs spat out responses that were "consistently grammatically correct and seemingly authoritative but riddled with arithmetic and critical-thinking mistakes." Using a simple scale where a score of "0" included completely incorrect financial analyses, a "0.5" denoted a correct financial analysis with mathematical errors, and a "1" that was correct on both the math and the financial analysis, no chatbot earned higher than a five out of 12 points maximum. ChatGPT led the pack with a 5.0, followed by DeepSeek's 4.0, Grok's 3.0, and Gemini's abysmal 1.5. Some of the chatbot responses were so bad that they defied the Walter Bradley experts' expectations. When Grok, for example, was asked to add up a single month's worth of expenses for a Caribbean rental property whose rent was $3,700 and whose utilities ran $200 per month, the chatbot claimed that those numbers together added up to $4,900. Along with spitting out a bunch of strange typographical errors, the chatbots also failed, per the study, to generate any intelligent analyses for the relatively basic financial questions the researchers posed. Even the chatbots' most compelling answers seemed to be gleaned from various online sources, and those only came when being asked to explain relatively simple concepts like how Roth IRAs work. Throughout it all, the chatbots were dangerously glib. The researchers noted that all of the LLMs they tested present a "reassuring illusion of human-like intelligence, along with a breezy conversational style enhanced by friendly exclamation points" that could come off to the average user as confidence and correctness. "It is still the case that the real danger is not that computers are smarter than us," they concluded, "but that we think computers are smarter than us and consequently trust them to make decisions they should not be trusted to make." More on dumb AI: OpenAI Researchers Find That Even the Best AI Is "Unable To Solve the Majority" of Coding Problems
-Reviewer-Photo-SOURCE-Julian-Chokkattu.jpg&w=3840&q=100)

WIRED
02-03-2025
- Business
- WIRED
Honor Debuts a New AI Agent That Can Read and Understand Your Screen
The Honor UI Agent—powered by Google's Gemini 2 model—gives us a glimpse of artificial intelligent agents on Android. Photograph: Julian Chokkattu We must all hate booking a table at a restaurant because it's once again the problem tech companies are trying to solve with the power of artificial intelligence. Honor has taken the wraps off of Honor UI Agent—a 'GUI-based mobile AI agent' that claims to handle tasks on your behalf by understanding the screen's graphical user interface. Its primary demo to show off this capability? Having the agent book a restaurant, naturally, through OpenTable. WIRED had an early opportunity to see the demo ahead of the company's keynote at Mobile World Congress 2025 in Barcelona, where Honor also announced its $10 billion Honor Alpha Plan. This long-term plan, envisioned by the Chinese company's new CEO Jian Li, is lofty and largely corporate-speak, comprised of goals like 'creating an intelligent phone" and 'open human potential boundaries and co-create a new paradigm for civilization.' What it really highlights is Honor's quick pivot into prioritizing AI development for its suite of personal technology devices. A GUI Agent In the demo, an Honor spokesperson asked Honor's UI Agent to book a table for four people, gave a time, and specified 'local food." (The AI takes location into context and understood that to mean Spanish food here in Barcelona.) What happens next is a little jarring—not in the way Google's Duplex technology was when it debuted in 2018 and had Google Assistant interact with real humans to make reservations on your behalf. Instead, you're forced to stare at Honor's screen, watching this agent run through the steps of finding a restaurant and booking a table through the OpenTable app. It doesn't quite feel 'smart" when you have to see the dull machinations of the process at work, though Honor tells me in the future its UI Agent won't need to show its homework. Photograph: Julian Chokkattu It chose a restaurant, but then couldn't complete the process as the spot it chose required a credit card to confirm a reservation, at which point the user had to take over. You can be flexible in your query—in another example, asking it to book a 'highly rated' restaurant meant it would look at reviews with high scores, though the agent doesn't do any more research than that. It's not cross-referencing OpenTable reviews with data from other parts of the web, especially since all of this data is processed on device and isn't sent to the cloud. This kind of agentic artificial intelligence is the current buzzword in the tech sphere. My colleague Will Knight recently tested an AI assistant that could browse the web and perform tasks online. Google late last year unveiled its Gemini 2 AI model trained to take actions on your behalf. It also renews the idea of a generative user interface for smartphones—at MWC 2024, we saw a few companies working on ways to interact with apps without using apps at all, instead leaning on AI assistants to generate a user interface as you issued a command. Honor's approach feels somewhat like what Rabbit—of the infamous Rabbit R1—is doing with Teach Mode, where you train its assistant manually to complete a task. There's no need to access an app's Application Programming Interface (API), which is the traditional way apps or services communicate with each other. The agent memorizes the process, allowing you to then issue the command and have it execute the task. But Honor says its self-reliant AI execution model isn't trained to follow strict steps—it's capable of multimodal screen context recognition to perform tasks autonomously. Instead of having to train the assistant to learn every single part of the OpenTable app, it is capable of understanding the semantic elements of the user interface and will follow-through with a multi-step process to execute your request. Honor highlighted that this process was more cost effective: 'Unlike competitors such as Apple, Samsung, and Google, which rely on external APIs—resulting in higher operational costs—Honor's AI Agent independently manages a wide range of tasks." Photograph: Julian Chokkattu While Honor says its UI agent uses in-house execution models, it also leverages Google's Gemini 2 large language model, which is what powers the intent recognition of your command and the 'enhanced semantic understanding' of what's on the screen. Google did not share any details about the nature of the collaboration. Honor says it has also partnered with Qualcomm to keep the data on the device and develop a personal knowledge base that learns your preferences over time. The idea is that if you tend to order the certain kinds of food in a delivery app, if you ask the agent to order on your behalf, it'll use that context to pick something it knows you like. The company says it's already employing some of these AI agents in China. At its keynote, Honor also announced that it will deliver seven years of software updates for its flagship Magic 7 Pro and upcoming devices—matching the software update policies from Google and Samsung for Pixel and Galaxy phones. It unveiled a handful of new gadgets at the show too, including the Honor Earbuds Open, Honor Watch 5 Ultra smartwatch, Honor Pad V9 tablet, and Honor MagicBook Pro 14 laptop. These devices won't be sold in the US, like most of Honor's products, but will be available in other markets. (The brand hosted WIRED at its media event at MWC 2025 and paid for a portion of our reporter's travel expenses.)