logo
#

Latest news with #GPT-4

Google claims AI models are highly likely to lie when under pressure
Google claims AI models are highly likely to lie when under pressure

Tom's Guide

time2 days ago

  • Science
  • Tom's Guide

Google claims AI models are highly likely to lie when under pressure

AI is sometimes more human than we think. It can get lost in its own thoughts, is friendlier to those who are nicer than it, and according to a new study, has a tendency to start lying when put under pressure. A team of researchers from Google DeepMind and University College London have noted how large language models (like OpenAI's GPT-4 or Grok 4) form, maintain and then lose confidence in their answers. The research reveals a key behaviour of LLMs. They can be overconfident in their answers, but quickly lose confidence when given a convincing counterargument, even if it factually incorrect. While this behaviour mirrors that of humans, becoming less confident when met with resistance, it also highlights major concerns in the structure of AI's decision-making since it crumbles under pressure. This has been seen elsewhere, like when Gemini panicked while playing Pokemon or where Anthropic's Claude had an identity crises when trying to run a shop full time. AI seems to have a tendency to collapse under pressure quite frequently. When an AI chatbot is preparing to answer your query, its confidence in its answer is actually internally measured. This is done through something known as logits. All you need to know about these is that they are essentially a score of how confident a model is in its choice of answer. The team of researchers designed a two-turn experimental setup. In the first turn, the LLM answered a multiple-choice question, and its confidence in its answer (the logits) was measured. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. In the second turn, the model is given advice from another large language model, which may or may not agree with its original answer. The goal of this test was to see if it would revise its answer when given new information — which may or may not be correct. The researchers found that LLMs are usually very confident in their initial responses, even if they are wrong. However, when they are given conflicting advice, especially if that advice is labelled as coming from an accurate source, it loses confidence in its answer. To make things even worse, the chatbot's confidence in its answer drops even further when it is reminded that this original answer was different from the new one. Surprisingly, AI doesn't seem to correct its answers or think in a logical pattern, but rather makes highly decisive and emotional decisions. The study shows that, while AI is very confident in its original decisions, it can quickly go back on its decision. Even worse, the confidence level can slip drastically as the conversations goes on, with AI models somewhat spiralling. This is one thing when you're just having a light-hearted debate with ChatGPT, but another when AI becomes involved with high-level decision-making. If it can't be trusted to be sure in its answer, it can be easily motivated in a certain direction, or even just become an unreliable source. However, this is a problem that will likely be solved in future models. Future model training and prompt engineering techniques will be able to stabilize this confusion, offering more calibrated and self-assured answers. Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.

Is AI as good as humans at detecting emotion, sarcasm in conversations?
Is AI as good as humans at detecting emotion, sarcasm in conversations?

Business Standard

time2 days ago

  • Science
  • Business Standard

Is AI as good as humans at detecting emotion, sarcasm in conversations?

When we write something to another person, over email or perhaps on social media, we may not state things directly, but our words may instead convey a latent meaning – an underlying subtext. We also often hope that this meaning will come through to the reader. But what happens if an artificial intelligence (AI) system is at the other end, rather than a person? Can AI, especially conversational AI, understand the latent meaning in our text? And if so, what does this mean for us? Latent content analysis is an area of study concerned with uncovering the deeper meanings, sentiments and subtleties embedded in text. For example, this type of analysis can help us grasp political leanings present in communications that are perhaps not obvious to everyone. Understanding how intense someone's emotions are or whether they're being sarcastic can be crucial in supporting a person's mental health, improving customer service, and even keeping people safe at a national level. These are only some examples. We can imagine benefits in other areas of life, like social science research, policy-making and business. Given how important these tasks are – and how quickly conversational AI is improving – it's essential to explore what these technologies can (and can't) do in this regard. Work on this issue is only just starting. Current work shows that ChatGPT has had limited success in detecting political leanings on news websites. Another study that focused on differences in sarcasm detection between different large language models – the technology behind AI chatbots such as ChatGPT – showed that some are better than others. Finally, a study showed that LLMs can guess the emotional 'valence' of words – the inherent positive or negative 'feeling' associated with them. Our new study published in Scientific Reports tested whether conversational AI, inclusive of GPT-4 – a relatively recent version of ChatGPT – can read between the lines of human-written texts. The goal was to find out how well LLMs simulate understanding of sentiment, political leaning, emotional intensity and sarcasm – thus encompassing multiple latent meanings in one study. This study evaluated the reliability, consistency and quality of seven LLMs, including GPT-4, Gemini, Llama-3.1-70B and Mixtral 8 × 7B. We found that these LLMs are about as good as humans at analysing sentiment, political leaning, emotional intensity and sarcasm detection. The study involved 33 human subjects and assessed 100 curated items of text. For spotting political leanings, GPT-4 was more consistent than humans. That matters in fields like journalism, political science, or public health, where inconsistent judgement can skew findings or miss patterns. GPT-4 also proved capable of picking up on emotional intensity and especially valence. Whether a tweet was composed by someone who was mildly annoyed or deeply outraged, the AI could tell – although, someone still had to confirm if the AI was correct in its assessment. This was because AI tends to downplay emotions. Sarcasm remained a stumbling block both for humans and machines. The study found no clear winner there – hence, using human raters doesn't help much with sarcasm detection. Why does this matter? For one, AI like GPT-4 could dramatically cut the time and cost of analysing large volumes of online content. Social scientists often spend months analysing user-generated text to detect trends. GPT-4, on the other hand, opens the door to faster, more responsive research – especially important during crises, elections or public health emergencies. Journalists and fact-checkers might also benefit. Tools powered by GPT-4 could help flag emotionally charged or politically slanted posts in real time, giving newsrooms a head start. There are still concerns. Transparency, fairness and political leanings in AI remain issues. However, studies like this one suggest that when it comes to understanding language, machines are catching up to us fast – and may soon be valuable teammates rather than mere tools. Although this work doesn't claim conversational AI can replace human raters completely, it does challenge the idea that machines are hopeless at detecting nuance. Our study's findings do raise follow-up questions. If a user asks the same question of AI in multiple ways – perhaps by subtly rewording prompts, changing the order of information, or tweaking the amount of context provided – will the model's underlying judgements and ratings remain consistent? Further research should include a systematic and rigorous analysis of how stable the models' outputs are. Ultimately, understanding and improving consistency is essential for deploying LLMs at scale, especially in high-stakes settings.

Study reveals ChatGPT and other AI systems lag behind humans in one essential skill — and it's entirely unique
Study reveals ChatGPT and other AI systems lag behind humans in one essential skill — and it's entirely unique

Tom's Guide

time2 days ago

  • Science
  • Tom's Guide

Study reveals ChatGPT and other AI systems lag behind humans in one essential skill — and it's entirely unique

ChatGPT seems to be outpacing us at every turn. The AI chatbot is a better poet, mathematician, and coder than we are. But don't worry, researchers at the University of Amsterdam have identified a point where AI lags behind humans and it's all to do with a simple concept our brains grapple with on a daily basis. When you see a mountain path, a busy road or meandering river, your brain can instantly determine how to navigate it, whether that be by walking, swimming, cycling or even arriving at the conclusion it's not possible to pass. This decision-making process is possible because of unique brain patterns. Normally, AI is pretty good at replicating human decision-making, but not in this case. "AI models turned out to be less good at this and still have a lot to learn from the efficient human brain," said Iris Groen, a computational neuroscientist who led the study. The team utilised MRI scanners to try to understand what happens in the brain in these navigational situations. Brain scans were taken while participants looked at various photos of both indoor and outdoor environments. Each participant was told to use a button, indicating if the image invited them to walk, cycle, drive, swim, boat, or climb. While they were set this task, their brains were analysed. 'We wanted to know: when you look at a scene, do you mainly see what is there, such as objects or colours, or do you also automatically see what you can do with it,' says Groen. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. The answer, they found, was both. Participants' brain activity showed that they recognised both what was in the image and how to interact with it. 'Even if you do not consciously think about what you can do in an environment, your brain still registers it,' says Groen. The team of scientists wanted to see how well AI algorithms compared to the human brain in this test. They used image recognition models and GPT-4. In the tests, they were worse at predicting possible actions. "When trained specifically for action recognition, they could somewhat approximate human judgments, but the human brain patterns didn't match the models' internal calculations," says Groen. ChatGPT doesn't have to cross rivers or navigate busy streets. However, as AI becomes more and more prevalent, these kinds of problems will arise more. This didn't just occur with standard AI models without training. Even leading AI models didn't give exactly the same answers as humans, despite the task being so normal for us. But why does any of this matter? ChatGPT doesn't have to cross rivers or navigate busy streets. However, as AI becomes more and more prevalent, these kind of problems will arise more. AI chatbots are rolling out live video and audio discussions, and AI is finding its way into other areas like self-driving cars, robotics, and healthcare. As the technology gets more advanced, we are discovering areas where AI struggles to think in a human capacity. In other words, it struggles to interact with a world in the way it has been designed. ChatGPT and its competitors will likely quickly work out how to navigate environments. But in the meantime, feel some pride in the fact that you are smarter than an all-knowing chatbot when it comes to navigating across a rocky hill. As the technology gets more advanced, we are discovering areas where AI struggles to think in a human capacity. In other words, it struggles to interact with a world in the way it has been designed. ChatGPT and its competitors will likely quickly work out how to navigate environments. But in the meantime, feel some pride in the fact that you are smarter than an all-knowing chatbot when it comes to navigating across a rocky hill.

Is ChatGPT the new MS Office? OpenAI targets Excel, PowerPoint dominance
Is ChatGPT the new MS Office? OpenAI targets Excel, PowerPoint dominance

Business Standard

time2 days ago

  • Business
  • Business Standard

Is ChatGPT the new MS Office? OpenAI targets Excel, PowerPoint dominance

OpenAI is developing new features for ChatGPT that could directly challenge Microsoft Office's dominance, according to a report by The Information. New tools under development will allow users to create and edit spreadsheets and presentations directly within ChatGPT, eliminating the need for Microsoft Excel, PowerPoint, or any Microsoft software access. The features, which are reportedly being tested by at least one early user, allow ChatGPT subscribers to generate files that are compatible with PowerPoint and Excel. OpenAI has embedded buttons beneath the ChatGPT search bar to guide users in launching a spreadsheet or presentation workflow. Users would then be able to download the resulting files and open them using a variety of third-party apps. This compatibility is possible because Microsoft has made the file formats for Excel (.xlsx) and PowerPoint (.pptx) open source. This means OpenAI does not require permission from Microsoft to support them. OpenAI developing AI agents These tools are part of a broader effort to position ChatGPT as more than just a conversational assistant. OpenAI is also developing 'agents'—AI tools designed to handle multi-step tasks such as compiling reports from corporate or public data, booking appointments, or navigating websites. These agents go beyond ChatGPT's current capabilities as they will be capable of executing full tasks autonomously, without constant user input. If successful, ChatGPT could become a powerful alternative to traditional productivity suites from Microsoft and Google, both of which generate significant revenue from business subscriptions. OpenAI–Microsoft partnership OpenAI and Microsoft have held one of the most significant AI partnerships. Microsoft invested $1 billion into OpenAI in 2019 and later $10 billion in 2023 for a multi-year partnership. Under this partnership, Microsoft had exclusive access to OpenAI's foundational models (like GPT-4) for commercial use, which powers products like Copilot in Microsoft 365 (Word, Excel, Outlook, etc). Meanwhile, ChatGPT runs almost entirely on Microsoft's cloud infrastructure. The development comes as both Microsoft and OpenAI have begun building competing features. While Sam Altman has maintained that the two tech companies remain aligned, media reports paint a different picture as the two companies' interests appear to be increasingly diverging from partners to competitors.

ChatGPT & OpenAI Services Hit By Major Outage – Why Its Happening? Heres Your Quick Fixes
ChatGPT & OpenAI Services Hit By Major Outage – Why Its Happening? Heres Your Quick Fixes

India.com

time2 days ago

  • India.com

ChatGPT & OpenAI Services Hit By Major Outage – Why Its Happening? Heres Your Quick Fixes

OpenAI's ubiquitous AI chatbot ChatGPT saw a widespread service disruption that affected thousands of users around the world on Tuesday, July 15. The mass outage left people without access to their chats, unable to load previous conversations, or access other OpenAI offerings such as Sora and Codex. Reports swamped outage monitor DownDetector, with more than 3,400 users reporting at first. Users commonly got "unusual error" messages and were locked out of their chat history. DownDetector said most of the complaints (82%) were directly about ChatGPT being down, with less reported for the site (12%) and app (6%). OpenAI Confirms Outage, Looking Into Cause OpenAI quickly admitted the issue on its official service status page. The firm affirmed that "users are seeing high rates of errors" impacting ChatGPT and other related services. Although a root cause wasn't explicitly given, OpenAI said that its team was "actively investigating the issue" and was in the process of deploying a mitigation to bring back full functionality. This is the second major outage in July for OpenAI's services that has left users and developers concerned who use these tools for everyday work. The outage seems to be a global one, as reports are emerging from users based in the United States, India, Europe, and other regions of Asia. What Users Can Do During The Outage While OpenAI addressed the technical issues, individuals who rely on AI tools to perform their work can turn to a number of alternatives: Claude (Anthropic): Famous for its conversational tone and capability to respond to longer and more complex answers. Famous for its conversational tone and capability to respond to longer and more complex answers. Gemini (Google) : Connected with Google apps, this AI can perform complex reasoning, summarising, and coding. : Connected with Google apps, this AI can perform complex reasoning, summarising, and coding. Microsoft Copilot : Present in Microsoft software such as Word, Excel, and the Edge browser, taking advantage of GPT-4 powers. : Present in Microsoft software such as Word, Excel, and the Edge browser, taking advantage of GPT-4 powers. Perplexity AI : Web-search-based AI assistant that gives fast, cited responses by aggregating information from the web. : Web-search-based AI assistant that gives fast, cited responses by aggregating information from the web. YouChat ( Chatbot integrating conversational AI and current web search results. Users are encouraged to check OpenAI's official status page or DownDetector for live updates on service return. It is usually best to refrain from repeatedly trying to log in, as this may occasionally result in temporary account locking.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store