Latest news with #InternationalConferenceonLearningRepresentations

AI models can't tell time or read a calendar, study reveals

Yahoo

17-05-2025

Science
Yahoo

AI models can't tell time or read a calendar, study reveals

When you buy through links on our articles, Future and its syndication partners may earn a commission. New research has revealed another set of tasks most humans can do with ease that artificial intelligence (AI) stumbles over — reading an analogue clock or figuring out the day on which a date will fall. AI may be able to write code, generate lifelike images, create human-sounding text and even pass exams (to varying degrees of success) yet it routinely misinterprets the position of hands on everyday clocks and fails at the basic arithmetic needed for calendar dates. Researchers revealed these unexpected flaws in a presentation at the 2025 International Conference on Learning Representations (ICLR). They also published their findings March 18 on the preprint server arXiv, so they have not yet been peer-reviewed . "Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people," study lead author Rohit Saxena, a researcher at the University of Edinburgh, said in a statement. These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation and assistive technologies." To investigate AI's timekeeping abilities, the researchers fed a custom dataset of clock and calendar images into various multimodal large language models (MLLMs), which can process visual as well as textual information. The models used in the study include Meta's Llama 3.2-Vision, Anthropic's Claude-3.5 Sonnet, Google's Gemini 2.0 and OpenAI's GPT-4o. And the results were poor, with the models being unable to identify the correct time from an image of a clock or the day of the week for a sample date more than half the time. Related: Current AI models a 'dead end' for human-level intelligence, scientists agree However, the researchers have an explanation for AI's surprisingly poor time-reading abilities. "Early systems were trained based on labelled examples. Clock reading requires something different — spatial reasoning," Saxena said. "The model has to detect overlapping hands, measure angles and navigate diverse designs like Roman numerals or stylized dials. AI recognizing that 'this is a clock' is easier than actually reading it." Dates proved just as difficult. When given a challenge like "What day will the 153rd day of the year be?," the failure rate was similarly high: AI systems read clocks correctly only 38.7% and calendars only 26.3%. This shortcoming is similarly surprising because arithmetic is a fundamental cornerstone of computing, but as Saxena explained, AI uses something different. "Arithmetic is trivial for traditional computers but not for large language models. AI doesn't run math algorithms, it predicts the outputs based on patterns it sees in training data," he said. So while it may answer arithmetic questions correctly some of the time, its reasoning isn't consistent or rule-based, and our work highlights that gap." The project is the latest in a growing body of research that highlights the differences between the ways AI "understands" versus the way humans do. Models derive answers from familiar patterns and excel when there are enough examples in their training data, yet they fail when asked to generalize or use abstract reasoning. "What for us is a very simple task like reading a clock may be very hard for them, and vice versa," Saxena said. RELATED STORIES —Scientists discover major differences in how humans and AI 'think' — and the implications could be significant —If any AI became 'misaligned' then the system would hide it just long enough to cause harm — controlling it is a fallacy —Researchers gave AI an 'inner monologue' and it massively improved its performance The research also reveals the problem AI has when it's trained with limited data — in this case comparatively rare phenomena like leap years or obscure calendar calculations. Even though LLMs have plenty of examples that explain leap years as a concept, that doesn't mean they make the requisite connections required to complete a visual task. The research highlights both the need for more targeted examples in training data and the need to rethink how AI handles the combination of logical and spatial reasoning, especially in tasks it doesn't encounter often. Above all, it reveals one more area where entrusting AI output too much comes at our peril. "AI is powerful, but when tasks mix perception with precise reasoning, we still need rigorous testing, fallback logic, and in many cases, a human in the loop," Saxena said.

This is the future of AI, according to Nvidia

Fast Company

05-05-2025

Business
Fast Company

This is the future of AI, according to Nvidia

Recent breakthroughs in generative AI have centered largely on language and imagery—from chatbots that compose sonnets and analyze text to voice models that mimic human speech and tools that transform prompts into vivid artwork. But global chip giant Nvidia is now making a bolder claim: the next chapter of AI is about systems that take action in high-stakes, real-world scenarios. At the recent International Conference on Learning Representations (ICLR 2025) in Singapore, Nvidia unveiled more than 70 research papers showcasing advances in AI systems designed to perform complex tasks beyond the digital realm. Driving this shift are agentic and foundational AI models. Nvidia's latest research highlights how combining these models can influence the physical world—spanning adaptive robotics, protein design, and real-time reconstruction of dynamic environments for autonomous vehicles. As demand for AI grows across industries, Nvidia is positioning itself as a core infrastructure provider powering this new era of intelligent action. Bryan Catanzaro, vice president of applied deep learning research at Nvidia, described the company's new direction as a full-stack AI initiative. 'We aim to accelerate every level of the computing stack to amplify the impact and utility of AI across industries,' he tells Fast Company. 'For AI to be truly useful, it must evolve beyond traditional applications and engage meaningfully with real-world use cases. That means building systems capable of reasoning, decision-making, and interacting with the real-world environment to solve practical problems.' Among the research presented, four models stood out—one of the most promising being Skill Reuse via Skill Adaptation (SRSA). This AI framework enables robots to handle unfamiliar tasks without retraining from scratch—a longstanding hurdle in robotics. While most robotic AI systems have focused on basic tasks like picking up objects, more complex jobs such as precision assembly on factory lines remain difficult. Nvidia's SRSA model aims to overcome that challenge by leveraging a library of previously learned skills to help robots adapt more quickly.

NTT Scientists Present Breakthrough Research on AI Deep Learning at ICLR 2025

Business Wire

24-04-2025

Business
Business Wire

NTT Scientists Present Breakthrough Research on AI Deep Learning at ICLR 2025

SUNNYVALE, Calif. & TOKYO--(BUSINESS WIRE)-- NTT Research, Inc. and NTT R&D, divisions of NTT (TYO:9432), announced that their scientists will present nine papers at the International Conference on Learning Representations (ICLR) 2025, a top-tier machine learning conference dedicated to the advancement of representation learning, particularly deep learning. Five of the accepted presentations result from research co-authored by scientists within NTT Research's recently announced Physics of Artificial Intelligence (PAI) Group led by Group Head Hidenori Tanaka. Collectively, this research breaks new ground in understanding how AI models learn, grow and overcome uncertainty—all supporting NTT's commitment to pioneering transformative, socially resilient, sustainable and responsible AI. Share Collectively, this research breaks new ground in understanding how AI models learn, grow and overcome uncertainty—all supporting NTT's commitment to pioneering transformative, socially resilient, sustainable and responsible AI. 'The Physics of AI Group and its collaborators share the excitement for AI's potential expressed by the public, the technology industry and the academic community,' said Tanaka. 'As the research accepted at ICLR 2025 shows, however, important questions remain about how AI fundamentally learns and how generative AI fundamentally creates outputs. Neural networks play a vital role in the 'deep learning' of AI, and improving our understanding of them is vital to ultimately foster the development of sustainable, reliable and trustworthy AI technologies.' One paper, ' Forking Paths in Neural Text Generation,' addresses the issue of estimating uncertainty in Large Language Models (LLMs) for proper evaluation and user safety. Whereas prior approaches to uncertainty estimation focus on the final answer in generated text—ignoring potentially impactful intermediate steps—this research tested the hypothesis of the existence of key forking tokens, such that re-sampling the system at those specific tokens, but not others, leads to very different outcomes. The researchers discovered many examples of forking tokens, including punctuation marks, suggesting that LLMs are often just a single token away from generating a different output. The paper was co-authored by Eric Bigelow 1,2,3, Ari Holtzman 4, Hidenori Tanaka 2,3 and Tomer Ullman 1,2. Four other papers co-authored by members of the NTT Research PAI Group will be presented at the show, including: ' In-Context Learning of Representations:' Researchers explore the open-ended nature of LLMs (for example, their ability to in-context learn) and whether models alter these pretraining semantics to adopt alternative, context-specific ones. Findings indicate that scaling context size can flexibly re-organize model representations, possibly unlocking novel capabilities. Authors include: Core Francisco Park 3,5,6, Andrew Lee 7, Ekdeep Singh Lubana 3,5, Yongyi Yang 3,5,8, Maya Okawa 3,5, Kento Nishi 5,7, Martin Wattenberg 7 and Hidenori Tanaka. ' Competition Dynamics Shape Algorithmic Phases of In-Context Learning: ' Researchers propose a synthetic sequence modeling task that involves learning to simulate a finite mixture of Markov chains. They argue that In-Context Learning (ICL) is best thought of as a mixture of different algorithms, each with its own peculiarities, instead of a monolithic capability, also implying that making general claims about ICL that hold universally across all settings may be infeasible. Authors include: Core Francisco Park, Ekdeep Singh Lubana, Itamar Pres 9 and Hidenori Tanaka. ' Dynamics of Concept Learning and Compositional Generalization: ' Researchers propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. Overall, the work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models. Authors include: Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu 8 and Hidenori Tanaka. ' A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language: ' Recognizing the need to establish the causal factors underlying the phenomenon of "emergence" in a neural network, researchers seek inspiration from the study of emergent properties in other fields and propose a phenomenological definition for the concept in the context of neural networks. Authors include: Ekdeep Singh Lubana, Kyogo Kawaguchi 10,11,12, Robert P. Dick 9 and Hidenori Tanaka. In addition, four papers authored or co-authored by NTT R&D scientists based in Japan will be presented at the show, including: ' Test-time Adaptation for Regression by Subspace Alignment ' Authors include: Kazuki Adachi 13,14, Shin'ya Yamaguchi 13,15, Atsutoshi Kumagai 13 and Tomoki Hamagami 14 . ' Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods ' Authors include: Akira Ito 16, Masanori Yamada 16 and Atsutoshi Kumagai. ' Positive-Unlabeled Diffusion Models for Preventing Sensitive Data Generation ' Authors include: Hiroshi Takahashi 13, Tomoharu Iwata 13, Atsutoshi Kumagai, Yuuki Yamanaka 13 and Tomoya Yamashita 13. ' Wavelet-based Positional Representation for Long Context ' Authors include: Yui Oka 13, Taku Hasegawa 13, Kyosuke Nishida 13, Kuniko Saito 13. ICLR 2025, the thirteenth International Conference on Learning Representations, is a globally esteemed conference on deep learning being held in Singapore April 24-28, 2025. Last year at ICLR 2024, NTT Research Physics & Informatics (PHI) Lab scientists co-authored two key papers: one on 'analyzing in-context learning dynamics with random binary sequences, revealing sharp transitions in LLM behaviors' and another on 'how fine-tuning affects model capabilities, showing minimal changes.' The NTT Research Physics of Artificial Intelligence Group is dedicated to advancing our understanding of deep neural networks and the psychology of AI. Its three-pronged mission includes: 1) Deepening our understanding of the mechanisms of AI, all the better to integrate ethics from within, rather than through a patchwork of fine-tuning (i.e. enforced learning); 2) Borrowing from experimental physics, it will continue creating systematically controllable spaces of AI and observe the learning and prediction behaviors of AI step-by-step; 3) Healing the breach of trust between AI and human operators through improved operations and data control. Formally established in April 2025 by members of the PHI Lab, the group began as a collaboration between the NTT Research and the Harvard University Center for Brain Science, having been formerly known as the Harvard University CBS-NTT Fellowship Program. About NTT Research NTT Research opened its offices in July 2019 in Silicon Valley to conduct basic research and advance technologies as a foundational model for developing high-impact innovation across NTT Group's global business. Currently, four groups are housed at NTT Research facilities in Sunnyvale: the Physics and Informatics (PHI) Lab, the Cryptography and Information Security (CIS) Lab, the Medical and Health Informatics (MEI) Lab, and the Physics of Artificial Intelligence (PAI) Group. The organization aims to advance science in four areas: 1) quantum information, neuroscience and photonics; 2) cryptographic and information security; 3) medical and health informatics; and 4) artificial intelligence. NTT Research is part of NTT, a global technology and business solutions provider with an annual R&D investment of thirty percent of its profits. NTT and the NTT logo are registered trademarks or trademarks of NIPPON TELEGRAPH AND TELEPHONE CORPORATION and/or its affiliates. All other referenced product names are trademarks of their respective owners. ©2025 NIPPON TELEGRAPH AND TELEPHONE CORPORATION

AI still can't beat humans at reading social cues

Yahoo

24-04-2025

Science
Yahoo

AI still can't beat humans at reading social cues

AI models have progressed rapidly in recent years and can already outperform humans in various tasks, from generating basic code to dominating games like chess and Go. But despite massive computing power and billions of dollars in investor funding, these advanced models still can't hold up to humans when it comes to truly understanding how real people interact with one another in the world. In other words, AI still fundamentally struggles at 'reading the room.' That's the claim made in a new paper by researchers from Johns Hopkins University. In the study, researchers asked a group of human volunteers to watch three-second video clips and rate the various ways individuals in those videos were interacting with one another. They then tasked more than 350 AI models—including image, video, and language-based systems—with predicting how the humans had rated those interactions. While the humans completed the task with ease, the AI models, regardless of their training data, struggled to accurately interpret what was happening in the clips. The researchers say their findings suggest that AI models still have significant difficulty understanding human social cues in real-world environments. That insight could have major implications for the growing industry of AI-enabled driverless cars and robots, which inherently need to navigate the physical world alongside people. 'Anytime you want an AI system to interact with humans, you want to be able to know what those humans are doing and what groups of humans are doing with each other,' John Hopkins University assistant professor of cognitive science and paper lead author Leyla Isik told Popular Science. 'This really highlights how a lot of these models fall short on those tasks.' Isik will present the research findings today at the International Conference on Learning Representations. Though previous research has shown that AI models can accurately describe what's happening in still images at a level comparable to humans, this study aimed to see whether that still holds true for video. To do that, Isik says she and her fellow researchers selected hundreds of videos from a computer vision dataset and clipped them down to three seconds each. They then narrowed the sample to include only videos featuring two humans interacting. Human volunteers viewed these clips, and answered a series of questions about what was happening, rated on a scale from 1 to 5. The questions ranged from objective prompts like 'Do you think these bodies are facing each other?' to more subjective ones, such as whether the interaction appeared emotionally positive or negative. In general, the human respondents tended to give similar answers, as reflected in their ratings—suggesting that people share a basic observational understanding of social interactions. The researchers then posed similar types of questions to image, video, and language models. (The language models were given human-written captions to analyze instead of raw video.) Across the board, the AI models failed to demonstrate the same level of consensus as the human participants. The language models generally performed better than the image and video models, but Isik notes that may be partly due to the fact that they were analyzing captions that were already quite descriptive. The researchers primarily evaluated open-access models, some of which were several years old. The study did not include the latest models recently released by leading AI companies like OpenAI and Anthropic. Still, the stark contrast between human and AI responses suggests there may be something fundamentally different about how models and humans process social and contextual information. 'It's not enough to just see an image and recognize objects and faces,' John Hopkins University doctoral student and paper co-author Kathy Garcia said in a statement. 'We need AI to understand the story that is unfolding in a scene. Understanding the relationships, context, and dynamics of social interactions is the next step, and this research suggests there might be a blind spot in AI model development.' The findings come as tech companies race to integrate AI into an increasing number of physical robots—a concept often referred to as 'embodied AI.' Cities like Los Angeles, Phoenix, and Austin have become test beds of this new era thanks to the increasing presence of driverless Waymo robotaxis sharing the roads with human-driven vehicles. Limited understanding of complex environments has led some driverless cars to behave erratically or even get stuck in loops, driving in circles. While some recent studies suggest that driverless vehicles may currently be less prone to accidents than the average human driver, federal regulators have nonetheless opened up investigations into Waymo and Amazon-owned Zoox for driving behavior that allegedly violated safety laws. Other companies—like Figure AI, Boston Dynamics, and Tesla —are taking things a step further by developing AI-enabled humanoid robots designed to work alongside humans in manufacturing environments. Figure has already signed a deal with BMW to deploy one of its bipedal models at a facility in South Carolina, though its exact purpose remains somewhat vague. In these settings, properly understanding human social cues and context is even more critical, as even small misjudgments in intention run the risk of injury. Going a step further, some experts have even suggested that advanced humanoid robots could one day assist with elder and child care. Isik suggested the results of the study mean there are still several steps that need to be taken before that vision becomes a reality. '[The research] really highlights the importance of bringing neuroscience, cognitive science, and AI into these more dynamic real world settings.' Isik said.

AI's Next Frontier: What the ICLR 2025 Conference Reveals About Machine Learning

Time Business News

23-04-2025

Business
Time Business News

AI's Next Frontier: What the ICLR 2025 Conference Reveals About Machine Learning

The International Conference on Learning Representations (ICLR) 2025, held in Singapore, has sparked excitement in the tech world, showcasing groundbreaking advancements in artificial intelligence (AI) and machine learning. From April 16–22, 2025, researchers gathered to present innovations in diffusion models, contrastive learning, and AI applications in healthcare and law. But what does this mean for the average person, and how will these developments shape our future? One major highlight was the focus on efficient diffusion-based sampling, a technique that powers AI tools like image and text generators. These models are becoming faster and more accurate, enabling everything from hyper-realistic digital art to personalized medical diagnostics. For instance, AI-driven health tools presented at ICLR could analyze patient data in real-time, offering doctors precise treatment recommendations. Similarly, advancements in large language models promise smarter, more context-aware chatbots that could revolutionize customer service or education. Another key topic was AI safety, a growing concern as AI systems become more autonomous. Researchers discussed methods to align AI models with human values, ensuring they don't produce harmful or biased outputs. This is critical as companies like xAI push boundaries with tools like Grok, which aim to accelerate human discovery while maintaining ethical standards. The conference also highlighted AI's role in niche fields. For example, legal tech innovations showcased how AI can streamline contract analysis, saving time and reducing errors. Meanwhile, reinforcement learning advancements could improve autonomous systems, from self-driving cars to robotic manufacturing. Why does this matter? These developments signal a future where AI is more integrated into daily life, making tasks faster, safer, and more personalized. However, they also raise questions about accessibility and regulation. Will these tools be affordable for all, or will they widen the digital divide? And how do we balance innovation with privacy concerns? ICLR 2025 underscores that AI is no longer a distant dream—it's here, evolving rapidly. For businesses, educators, and individuals, staying informed is crucial to harnessing its potential responsibly. For more insights on AI's impact, check out Explained Now at where we break down complex tech trends in simple terms. TIME BUSINESS NEWS