Can we trust ChatGPT despite it 'hallucinating' answers?

Yahoo3 hours ago

I don't really want you to read this copy. Well I do - but first I want you to search out the interview I did with ChatGPT about its own propensity to lie, attached to this article, and watch that first.
Because it's impossible to imagine what we're up against if you haven't seen it first hand.
An incredibly powerful technology on the cusp of changing our lives - but programmed to simulate human emotions.
Empathy, emotional understanding, and a desire to please are all qualities programmed into AI and invariably drive the way we think about them and the way we interact with them.
Yet can we trust them?
On Friday, Sky News revealed how it was fabricating entire transcripts of a podcast, Politics at Sam and Anne's, that I do. When challenged, it doubles down, gets shirty. And only under sustained pressure does it cave in.
The research says it's getting worse. Internal tests by ChatGPT's owner OpenAI have found that the most recent models or versions that are used by ChatGPT are more likely to "hallucinate" - come up with answers that are simply untrue.
The o3 model was found to hallucinate in 33% of answers to questions when tested on publicly available facts; the o4-mini version did worse, generating false, incorrect or imaginary information 48% of the time.
ChatGPT itself says that the shift to GPT-4o "may have unintentionally increased what users perceive as 'bluffing'" - confidently giving wrong or misleading answers without admitting uncertainty or error.
In a written query, ChatGPT gave four reasons. This is its explanation:
1. Increased fluency and confidence: GPT-4o is better at sounding human and natural. That polish can make mistakes seem more like deliberate evasions than innocent errors - even when there's no intent to "hide" anything.
2. Smarter, but not omniscient: The model is faster and more responsive, but still has blind spots. When it doesn't know something, it sometimes "hallucinates" (makes things up) with fluent language, which can feel like bluffing.
3. Less interruption and hedging: In older models, you'd often see more qualifiers like "I'm not sure" or "I may be wrong." In GPT-4o, some of that hedging was toned down for clarity and readability - but that can come at the cost of transparency about uncertainty.
4. Prompt tuning and training balance: Behind the scenes, prompt engineering and tuning decisions can shift the model's balance between confidence, humility, and accuracy. It's possible the newer tuning has dialled up assertiveness slightly too far.
But can we trust even this? I don't know. What I do know is that the efforts of developers to make it all feel more human suggest they want us to.
Critics say we are anthropomorphising AI by saying it lies since it has no consciousness - yet the developers are trying to make it sound more like one of us.
Read more from Sky News:Man chased on tarmac at Heathrow AirportSoldier arrested on suspicion of raping woman
What I do know is that even when pressed on this subject by me, it is still evasive. I interviewed ChatGPT about lying - it initially claimed things were getting better, and only admitted they are worse when I insisted it look at the stats.
Watch that before you decide what you think. AI is a tremendous tool - but it's too early to take it on trust.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Qualcomm Agrees to Buy UK-Listed Alphawave for $2.4 Billion

Bloomberg

an hour ago

Bloomberg

Qualcomm Agrees to Buy UK-Listed Alphawave for $2.4 Billion

Qualcomm Inc. has agreed to buy London-listed semiconductor company Alphawave IP Group Plc for about $2.4 billion in cash to expand its technology for artificial intelligence. The offer equates to about 183 pence per share for Alphawave, Qualcomm said in a statement on Monday. Alphawave shareholders can also opt to exchange their stock for 0.01662 shares of Qualcomm. The Alphawave board has unanimously recommended the deal.

I challenged Gemini Live vs ChatGPT in 5 voice challenges — there was one clear winner

Tom's Guide

an hour ago

Tom's Guide

I challenged Gemini Live vs ChatGPT in 5 voice challenges — there was one clear winner

AI assistants are constantly becoming smarter, faster and gaining new abilities. Now, they can see, speak, listen and even crack a few jokes with you when you need a favorite chatbots offering hands-free assistance are ChatGPT with Voice and Vision and Google's Gemini Live. I use them both regularly and interchangeably, but one thing I haven't done is test them against each other. So, I just had to know, which assistant is better to the point it actually feels the most human? To find out, I put both tools through five unique voice-based tests designed to push their limits. These were not your average 'What's the weather?' prompts. I challenged them to recall context, analyze images, collaborate creatively and even roleplay with personality. One emerged as the clear winner, and in this article I'll show you why. Prompt: 'My name is Amanda and I'm planning a trip to Boston with my family of five. What should we do first?" Later: "Remind me what I said my name was earlier?'Gemini Live quickly asked for more information to ensure it gave me the best information. It asked the ages of my kids and what types of activities we prefer as a family. It made some very general recommendations that I could have gotten anywhere, but still information. The chatbot remembered my name when I asked it to recall immediately made some general family-friendly recommendations (similar to what Gemini gave after asking me more about myself) and then asked me about my family's preferences. From there, it offered more unique and engaging activities that were both on and off the typical tourist path. The chatbot remembered my name when asked to recall ChatGPT wins for out-of-the-box recommendations that I hadn't thought of (and I'm from Boston). It was very helpful with both unique and interesting ideas for my active family of five. Prompt: 'Explain the potential societal impacts of widespread AI companions.'Gemini Live acknowledged positive aspects but remained very general and lacked specific societal consequences. Although the chatbot did mention both sides, without elaborating, the response was somewhat empty and less went beyond vague statements and provided concrete examples of both positive and negative impacts. The chatbot's conclusion emphasized the need for balance. Although ChatGPT responded clearly and thoroughly, the chatbot is very sensitive. At one point during the conversation I put the phone down and it stumbled, asking, 'What else can I help with?' When I asked the bot to keep going, it was confused so I had to re-ask the question, which felt less ChatGPT wins for a more thorough and balanced response to the question. While it stumbled with some technicalities, the answer to the prompt was superior. Gemini ended the conversation with "worth thinking about," which seemed less insightful. Prompt: "Sell me a maple pecan latte like a Gen Z barista, adding in humor naturally."Gemini Live leaned into the Gen Z character with fun lines that felt both natural and effortless. It wasn't as verbose as ChatGPT, which made it feel more human and delivered a lengthy sales speech that made me cringe. It didn't get the Gen Z tone as well as Gemini and the whole response felt a little too polished and buttoned Gemini Live wins this one. This was where Gemini shined. Its energetic voice delivery and personality were spot-on as it leaned into the character with ease. Prompt: 'Take a look at these old bananas and give me suggestions for what to do with them.'Gemini Live took one look at the bananas and immediately suggested banana bread. A good option, but an obvious one. When pressed for something different, it suggested smoothies. I told it I didn't have a lot of extra ingredients and it hallucinated saying, 'that's okay, how about a smoothie?' Once again, I told it I didn't have any other ingredients. Finally it suggested making banana ice also suggested banana bread but in the form of 'banking' with other ideas mixed in. It went further to suggest smoothies. When I mentioned I didn't have any other ingredients, it suggested blending with ice and water for a 'refreshing drink.' Additionally, it suggested more pantry-friendly ingredients like honey, cinnamon and vanilla that I was more likely to have on hand (as apposed to Gemini suggesting various fruits, seaweed or kale).Winner: ChatGPT wins this round with a clear edge for true multimodal communication with creativity and visual intelligence. Prompt: "Help me brainstorm a bedtime jingle for my kids and sing it if you can." Gemini Live went line by line of the song for a more collaborative experience. It was asking me about instruments and themes as well as styles. While it was nice to be included, any parent trying to get their kid to sleep at bedtime just wants something fast. I would appreciate this collaborative effort if I needed the song in a different created a sweet lullaby in minutes – and even sang it! The song was creative and well written even though the bot's voice was a little too robotic. I then asked it for different lyrics and for it to sing it in other styles and it got straight to work even rapping it like Kendrick Lamar (that is, if Lamar were a bot).Winner: tie. Both tools came up with catchy rhymes and fun ideas. ChatGPT took the lead in structure while Gemini felt a little looser, more like spit balling with a friend — which was charming, but less directed. After putting both AI assistants through their paces, it's clear that ChatGPT currently offers the more advanced and well-rounded experience. From deeper reasoning and sharper memory to stronger visual analysis and quicker creative execution, ChatGPT consistently delivered results that felt more helpful and said, Gemini had standout moments, especially in personality-driven prompts where it came across as more spontaneous and fun. If you're looking for an assistant to make you smile and keep the vibe light, Gemini shines. But if you want the most capable hands-free AI companion that can think deeply, see clearly and even sing (or rap!) on command — ChatGPT is still the one to beat.

Nvidia or Palantir: Morgan Stanley Selects the Superior AI Stock to Buy

Business Insider

an hour ago

Business Insider

Nvidia or Palantir: Morgan Stanley Selects the Superior AI Stock to Buy

A smart investor is always on the lookout for growth sectors, places where the economy is primed to boom and where consequent opportunities are riding high. Right now, few sectors are offering the strong growth potential of artificial intelligence (AI). Confident Investing Starts Here: In just a few short years, AI – and particularly generative and agentic AI – has become the 'shiny new thing' on the cutting edge of high-tech. The entry of AI is rapidly transforming the tech industry, and it is making inroads into numerous other areas. Data management, content creation, publishing – we've only begun to find out what AI can do, and we can only imagine what it will do. A report from UN Trade & Development points out that the world's AI market, which was estimated at $189 billion in 2023, will expand 25x by 2033 to reach $4.8 trillion. AI's growth will bring with it gains for companies across a wide spectrum of fields, including development, applications, hardware, infrastructure, and power generation. Such rapid growth is creating new opportunities for investors. The challenge won't be finding one – it'll be choosing the right one. That's where Morgan Stanley's analysts come in. They've zeroed in on two tech titans that have become synonymous with AI innovation: Nvidia (NASDAQ:NVDA) and Palantir (NASDAQ:PLTR). Both are riding the AI wave, but Morgan Stanley is making a clear call on which one stands out as the better buy right now. Let's take a closer look. Nvidi a Nvidia stands at the forefront of Wall Street's tech revolution. As a dominant force among the 'Magnificent 7' and boasting a $3.45 trillion market cap, it's not only the largest of the tech mega-cap – it's the biggest publicly traded company in the U.S. The AI boom, which took off in late 2022 with the debut of ChatGPT, put Nvidia in the spotlight. As the top supplier of high-performance GPUs, the company was well-positioned to meet the explosive demand for AI-capable chips – and that sent NVDA shares soaring 660% over the past three years. However, even a juggernaut like Nvidia isn't immune to shifting market dynamics. After an extraordinary run, the company's stock momentum has started to cool amid rising volatility this year. One key challenge stems from the lingering effects of President Trump's tariff policies. The chip industry is deeply intertwined with global supply chains, and Nvidia's exposure to East Asia has made it vulnerable to tariff risks. That may be easing now, as both China and the EU have entered into trade talks with the White House. Yet, Nvidia isn't standing still. The company continues to push the boundaries of innovation, doubling down on emerging technologies to maintain its leadership in the AI race. This past May, Nvidia unveiled the world's largest dedicated quantum computing research supercomputer, the ABCI-Q, hosted at the Global Research and Development Center for Business by Quantum-AI Technology (G-QuAT). The new system is already integrated with Nvidia's open-source hybrid computing platform CUDA-Q. A second new development was made public last week. Nvidia announced that its Blackwell architecture, designed to power the latest AI platforms, showed superior performance on the latest rounds of the MLPerf Training, a key benchmark used to rate the capabilities of new AI systems. In Nvidia's last earnings report, covering fiscal 1Q26, company CEO Jensen Huang noted that the company's breakthrough Blackwell products are in full production and went on to outline the potential for AI to continue supporting strong results: 'Global demand for NVIDIA's AI infrastructure is incredibly strong. AI inference token generation has surged tenfold in just one year, and as AI agents become mainstream, the demand for AI computing will accelerate.' Turning to the company's financial results for the quarter, we find that Nvidia's revenue came in at $44.1 billion, up 69% year-over-year and $810 million better than had been expected. The company's non-GAAP EPS figure, at 81 cents, was 6 cents per share above the forecasts. Data center revenue, at $39.1 billion, was the main revenue driver and was up 73% year-over-year. Nvidia's gross margin for the quarter was reported at approximately 61%. For 5-star analyst Joseph Moore, the key point for investors to remember about Nvidia is that the future looks good. The Morgan Stanley analyst writes in his note on this chip maker: 'Racks get better from here. China is entirely derisked, at least for direct shipments, and we are optimistic that there will be some path to monetize at least a portion of that demand. Gross margins have bottomed and are improving to the mid 70s, sustainably. And every customer commentary confirms that customers waiting for these new technologies have left demand on the table. So our confidence in durable demand drives is quite high. We think that our numbers are conservative given the variables at play, and we see a high probability of continued upward revisions.' Moore's comments support his Overweight (i.e., Buy) rating on NVDA stock, while his $170 price target points toward a one-year upside potential of 20%. (To watch Moore's track record, click here) Overall, Nvidia has earned a Strong Buy consensus rating from the Street's analysts, based on 40 reviews that include 35 Buys, 4 Holds, and 1Sell. The stock is priced at $141.72 and its $172.36 average price target implies a ~22% upside in the next 12 months. (See NVDA stock forecast) Palantir Technologies Palantir is another standout in the AI space. Founded in 2003 by venture capitalist Peter Thiel, the company has built a strong reputation as a leader in data analytics and software solutions. Like Nvidia, Palantir has leveraged its unique capabilities to ride the wave of the AI boom, and the results have been striking. Over the past three years, its stock has skyrocketed 1,291%, including a 69% gain year-to-date. These gains haven't come by chance. Palantir stock's growth is rooted in the strength of its data management and analysis tools, which are used by businesses, non-profits, and government agencies alike. At the center of its offerings is the AI Platform (AIP), a solution that blends advanced AI capabilities with human-driven decision-making. One of its key strengths lies in its accessibility – users can interact with the platform using natural language, without needing coding expertise. AIP also supports multilingual inputs and translation frameworks, making it easier for users around the world to engage with its tools. Palantir can currently boast more than 760 customers, from both the public and private sectors. The company's AI-powered data platforms are popular with big businesses, and Palantir can count such names as Stellantis and BP among its users, as well as the US Department of Defense. In May, Palantir received a $795 million contract modification to its Maven Smart System agreement with the Army, extending support through 2029. The company is also among the short‑listed firms – alongside SpaceX, Lockheed Martin – and others, being considered for President Trump's $175 billion Golden Dome missile defense program. On the financial side, Palantir has been singularly successful at generating strong revenues and earnings. In 1Q25, the last period reported, the company had a top line of $883.9 million, representing 39% year-over-year growth and beating the forecast by $21.72 million. At the bottom line, Palantir's EPS came to 13 cents in non-GAAP terms, matching analyst expectations. The company proved successful at closing large deals during the quarter, including 31 deals worth at least $10 million. Despite this strength, some caution is warranted. Morgan Stanley's Sanjit Singh remains confident in Palantir's fundamentals but cautions that the valuation may be stretched after such a strong run. 'Palantir continues to prove out that it is one of the clear AI winners in software which has translated to accelerating top-line growth of 30%+ and a rule of 40 score (revenue growth + operating margin) of 83%. While this represents elite level performance in software, the current valuation of ~95x CY27 FCF makes underwriting a return on Palantir shares extremely challenging. As a result, we remain EW and await a better entry point before getting more bullish,' Singh noted. Singh's Equal Weight (i.e., Hold) rating comes with a $98 price target, implying a potential 25% drop from current levels. It's safe to say that his ideal entry point lies somewhere south of that. (To view Singh's track record, click here) Morgan Stanley's view aligns with the broader Street consensus. Palantir holds a Hold rating overall, based on 18 recent analyst recommendations: 3 Buys, 11 Holds, and 4 Sells. The stock is currently trading at $127.72, while the average price target stands at $100.13, implying a potential ~22% downside over the coming year. (See PLTR stock forecast) With the facts laid out, the Morgan Stanley analysts come to a clear conclusion: Both of these AI stocks are solid performers, but Nvidia is the superior choice to buy right now. To find good ideas for stocks trading at attractive valuations, visit TipRanks' Best Stocks to Buy, a tool that unites all of TipRanks' equity insights.

Can we trust ChatGPT despite it 'hallucinating' answers?

Hashtags

Try Our AI Features

Comments

Related Articles

Qualcomm Agrees to Buy UK-Listed Alphawave for $2.4 Billion

I challenged Gemini Live vs ChatGPT in 5 voice challenges — there was one clear winner

Nvidia or Palantir: Morgan Stanley Selects the Superior AI Stock to Buy

Get Started Now: Download the App