Latest news with #GPT-o1


Int'l Business Times
07-05-2025
- Int'l Business Times
OpenAI's Latest ChatGPT AI Models Are Smarter, But They Hallucinate More Than Ever
Artificial intelligence is evolving fast, but not always in the right direction. OpenAI's latest models, GPT o3 and o4-mini, were built to mimic human reasoning more closely than ever before. However, a recent internal investigation reveals an alarming downside: these models may be more intelligent, but they're also more prone to making things up. Hallucination in AI is a Growing Problem OpenAI's Latest ChatGPT AI Models Are Smarter, But They Hallucinate Since the birth of chatbots, hallucinations, also known as false or imaginary facts, have been a persistent issue. With each model iteration, the hope was that these AI hallucinations would decline. But OpenAI's latest findings suggest otherwise, according to The New York Times. In a benchmark test focused on public figures, GPT-o3 hallucinated in 33% of responses, twice the error rate of its predecessor, GPT-o1. Meanwhile, the more compact GPT o4-mini performed even worse, hallucinating nearly half the time (48%). Reasoning vs. Reliability: Is AI Thinking Too Hard? Unlike previous models that were great at generating fluent text, o3 and o4-mini were programmed to reason step-by-step, like human logic. Ironically, this new "reasoning" technique might be the problem. AI researchers say that the more a model does reasoning, the more likely it is to go astray. Unlike low-flying systems that stay with secure, high-confidence responses, these newer systems attempt to bridge between complicated concepts, which can cause bizarre and incorrect conclusions. On the SimpleQA test, which tests general knowledge, the performance was even worse: GPT o3 hallucinated on 51% of responses, while o4-mini shot to an astonishing 79%. These are not small errors; these are huge credibility gaps. Why More Sophisticated AI Models May Be Less Credible OpenAI attributes the rise in AI hallucinations to possibly not being the result of the reasoning itself, but of the verbosity and boldness of the models. While attempting to be useful and comprehensive, the AI begins to guess and sometimes mixes theory with fact. The outcome will sound very convincing, but they're entirely incorrect answers. According to TechRadar, this becomes especially risky when AI is employed in high-stakes environments such as law, medicine, education, or government service. A single hallucinated fact in a legal brief or medical report could have disastrous repercussions. The Real-World Risks of AI Hallucinations We already know attorneys were sanctioned for providing fabricated court citations produced by ChatGPT. But what about minor mistakes in a business report, school essay, or government policy memo? The more integrated AI becomes into our everyday routines, the fewer opportunities there are for error. The paradox is simple: the more helpful AI is, the more perilous its mistakes are. You can't save people time if they still need to fact-check everything. Treat AI Like a Confident Intern Though GPT o3 and o4-mini demonstrate stunning skills in coding, logic, and analysis, their propensity to hallucinate means users can't rely on them when they require rock-solid facts. Until OpenAI and its rivals are able to minimize these hallucinations, users need to take AI output with a grain of salt. Consider it this way: These chatbots are similar to that in-your-face co-worker who always has a response, but you still fact-check everything they state. Originally published on Tech Times


Korea Herald
12-04-2025
- Business
- Korea Herald
SenseTime's SenseNova V6: China's Most Advanced Multimodal Model with the Lowest Cost in the Industry
Integrating AI into Everyday Life HONG KONG, April 12, 2025 /PRNewswire/ -- SenseTime launched its newly upgraded large model series, SenseNova V6, at its Tech Day event held in several locations, including Shanghai and Shenzhen. Leveraging advances in the training of multimodal long chain-of-thought (CoT), global memory, and reinforcement learning, the model delivers industry-leading multimodal reasoning capabilities while setting a new benchmark for cost efficiency. The capabilities of the SenseNova V6 model have been greatly enhanced, with strong advantages in long CoT, reasoning, mathematical capabilities, and global memory. Its multimodal reasoning capabilities ranked first in China when benchmarked against GPT-o1, while its data analysis performance outpaced GPT-4o. It also combines high performance with cost efficiency. Its multimodal training efficiency is aligned with that of language models, providing the lowest training costs in the industry. Its reasoning costs are also the lowest in the industry. The new lightweight full-modal interactive model, SenseNova V6 Omni, delivers the most advanced multimodal interactive capabilities in China. It is China's first large model that supports in-depth analysis of 10-minute mid-to-long form videos, benchmarked against Gemini 2.5 Turbo to be among the strongest in its class. Dr. Xu Li, Chairman of the Board and CEO of SenseTime, said, "AI's true purpose is found in our everyday lives. SenseNova V6 has pushed past the boundaries of multimodality, unlocking infinite possibilities in reasoning and intelligence." Multimodal long-chain reasoning, reinforcement learning, and global memory: SenseNova V6 leads the way in enabling multimodal deep thinking As a native Mixture of Experts (MoE)-based multimodal general foundation model with over 600 billion parameters, SenseNova V6 has achieved multiple technological breakthroughs. A single model is able to perform a range of tasks across text and multimodal domains, including: In leading benchmark evaluations of reasoning and multimodal capabilities, SenseNova V6 achieved state-of-the-art results across multiple metrics. Based on more than 200B of high-quality multimodal long CoT data, SenseTime leverages multi-agent collaboration to synthesize and verify long CoT. SenseNova V6 has developed exceptional multimodal reasoning capabilities, supporting multimodal long CoTs up to 64K tokens, enabling the model's long-term thinking capability. In solving complex real-world problems, SenseNova V6 utilizes its robust hybrid image and text understanding and reasoning capabilities to help users with a range of tasks. For complex document processing scenarios, SenseNova V6 is able to help users with difficult tasks through its strong multimodal reasoning capabilities. For example, in insurance claims processing, SenseNova V6 can assess whether the submitted commercial health insurance claims meet the requirements. It can detect issues such as unnecessary prescriptions and examinations, missing documents, or incomplete submissions. Leveraging breakthroughs in multimodal reinforcement learning, SenseTime has developed a hybrid reinforcement learning framework for various image-text tasks, based on different difficulty levels and multi-reward models. China's first model to break the 10-minute barrier in video understanding, achieving analysis of extended content within seconds With its global memory capability, SenseNova V6 overcomes the limitations of traditional models that could only support short videos, and now supports full-framerate analysis of 10-minute videos. With advanced comprehension capabilities, SenseNova V6 is also able to intelligently edit and extract video highlights, helping users to retain memorable moments. SenseTime's proprietary technology aligns visual information (images), auditory information (speech and sounds), linguistic information (subtitles and spoken language), and temporal logic to form a multimodal unified sequential representation. Based on this framework, it applies fine-grained cascading compression and content-aware dynamic filtering to achieve high-ratio compression of long videos. A 10-minute video can be compressed into 16K tokens while retaining key semantics. Human-like interaction: SenseNova V6 Omni launches with multi-industry deployment With the launch of SenseNova V6, SenseNova's has upgraded its real-time interactive unified large model to SenseNova V6 Omni, with deep optimizations across scenarios, including role-playing, translation and reading, cultural tourism guiding, picture book narration, and mathematical explanation. In translation and reading scenarios, SenseNova V6 Omni enables users to achieve precise spatial interactions with a simple finger gesture. The model also accurately understands the relationship between local and global information, providing a more intuitive and human-like interactive experience. SenseNova V6 Omni features more human-like perceptual and expressive abilities, as well as emotional understanding. It has been deployed across multiple industries and scenarios, including embodied intelligence, becoming the first commercialized full-modality real-time interactive model in China. Full-featured version of SenseChat launched, now available for preview SenseTime has released a comprehensive update to SenseChat, along with a brand-new app built on the complete capabilities of SenseNova V6. Through a single access point, users can engage in seamless multimodal interactive streaming experiences across text, images, and video. The SenseChat app is available for preview and SenseNova V6 is now available for trial via the SenseChat web platform at RMB100 million in vouchers released to accelerate full-stack scenario implementation SenseTime also announced a dedicated subsidy of RMB100 million, aimed at advancing emerging fields such as embodied intelligence and AIGC. Through targeted and multi-dimensional initiatives, SenseTime is delivering a one-stop solution designed for high efficiency, low cost, and end-to-end AI implementation, spanning expert consulting, model training, and reasoning validation. - End - About SenseTime SenseTime is a leading AI software company focused on creating a better AI-empowered future through innovation. We are committed to advancing the state of the art in AI research, developing scalable and affordable AI software platforms that benefit businesses, people and society as a whole, while attracting and nurturing top talents to shape the future together. With our roots in the academic world, we invest in our original and cutting-edge research that allows us to offer and continuously improve industry-leading AI capabilities in universal multimodal and multi-task models, covering key fields across perception intelligence, natural language processing, decision intelligence, AI-enabled content generation, as well as key capabilities in AI chips, sensors and computing infrastructure. Our proprietary AI infrastructure, SenseCore, integrates computing power, algorithms, and platforms, enabling us to build the "SenseNova" foundation model sets and R&D system that unlocks the ability to perform general AI tasks at low cost and with high efficiency. Our technologies are trusted by customers and partners in many industry verticals including Generative AI, Computer Vision and Smart Auto. SenseTime has been actively involved in the development of national and international industry standards on data security, privacy protection, ethical and sustainable AI, working closely with multiple domestic and multilateral institutions on ethical and sustainable AI development. SenseTime was the only AI company in Asia to have its Code of Ethics for AI Sustainable Development selected by the United Nations as one of the key publication references in the United Nations Resource Guide on AI Strategies, and was published in June 2021. SenseTime Group Inc. has successfully listed on the Main Board of the Stock Exchange of Hong Kong Limited (HKEX). We have offices in markets including Hong Kong, Shanghai, Beijing, Shenzhen, Chengdu, Hangzhou, Nanping, Qingdao, Xi'an, Macau, Kyoto, Tokyo, Singapore, Riyadh, Abu Dhabi, Dubai, Kuala Lumpur and South Korea, etc., as well as presence in Germany, Thailand, Indonesia and the Philippines. For more information, please visit SenseTime's official website or LinkedIn, X, Facebook and Youtube pages.
Yahoo
09-03-2025
- Business
- Yahoo
Copilot might soon get more Microsoft AI models, less ChatGPT presence
Microsoft is one of the early backers of OpenAI, and has repeatedly hawked products like Copilot by touting their access to the latest ChatGPT models. Now, it seems Microsoft is looking to push its own AI models in the popular software suite, while also developing a rival to OpenAI's reasoning models in the 'GPT-o' family. As per The Information, employees at Microsoft's AI unit recently concluded the training of 'a new family of AI model' that are currently in development under the 'MAI' codename. Internally, the team is hopeful that these in-house models perform nearly as well as the top AI models from the likes of OpenAI and Anthropic. Under the leadership of its AI chief, Mustafa Suleyman, Microsoft is launching this initiative to trim down its dependence on OpenAI and develop its own AI stack for Copilot applications. The developments are not surprising. In the last week of February, Microsoft introduced new small language models called Phi-4-multimodal and Phi-4-mini. They come with multi-modal capabilities, which means they can process text, speech, and vision as input formats, just like OpenAI's ChatGPT and Google's Gemini. These two new AI models are already available to developers via Microsoft's Azure AI Foundry and third-party platforms such as HuggingFace and the NVIDIA API Catalog. In benchmarks shared by the company, the Phi-4 model is already ahead of Google's latest Gemini 2.0 series models on multiple test parameters. 'It is among a few open models to successfully implement speech summarization and achieve performance levels comparable to GPT-4o model,' Microsoft noted in its blog post. The company is hoping to release its 'MAI' models commercially via its Azure service. Aside from testing in-house AI models for Copilot, Microsoft is also exploring third-party options such as DeepSeek, xAI, and Meta. DeepSeek recently made waves by offering a high performance benchmark at a dramatically lower development cost. It has already been adopted by numerous companies and recently claimed a theoretical cost-to-profit ratio of over 500% on a daily basis. Aside from developing its own AI models to replace OpenAI's GPT infrastructure for Copilot, Microsoft is also reportedly working on its own reasoning AI models, too. This would pit Microsoft against OpenAI products such as GPT-o1 as well as Chinese upstarts such as DeepSeek, both of which offer reasoning capabilities. Apparently, the work on an in-house reasoning model has been expedited due to strained relationships between Microsoft and OpenAI teams over technology sharing. According to The Information, Suleyman and OpenAI have been at odds over the latter's lack of transparency regarding the intricate workings of its AI models such as GPT-o1. Reasoning models are deemed to be the next frontier for AI development, as they offer a more nuanced understanding of queries, logical deduction, and better problem solving capabilities. Microsoft also claims that its Phi-4 model delivers stronger language, mathematical, and visual science reasoning chops.