logo
PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

Malay Maila day ago

A research team led by Prof. Li Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, explored the similarities between large language models and human representations, shedding new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge.
HONG KONG SAR - Media OutReach Newswire - 9 June 2025 - Can one truly understand what "flower" means without smelling a rose, touching a daisy or walking through a field of wildflowers? This question is at the core of a rich debate in philosophy and cognitive science. While embodied cognition theorists argue that physical, sensory experience is essential to concept formation, studies of the rapidly evolving large language models (LLMs) suggest that language alone can build deep, meaningful representations of the world.By exploring the similarities between LLMs and human representations, researchers at The Hong Kong Polytechnic University (PolyU) and their collaborators have shed new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge. Their findings also revealed how the use of sensory input for grounding or embodiment – connecting abstract with concrete concepts during learning – affects the ability of LLMs to understand complex concepts and form human-like representations. The study, in collaboration with scholars from Ohio State University, Princeton University and City University of New York, was recently published in Nature Human Behaviour Led by Prof. LI Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, the research team selected conceptual word ratings produced by state-of-the-art LLMs, namely ChatGPT (GPT-3.5, GPT-4) and Google LLMs (PaLM and Gemini). They compared them with human-generated word ratings of around 4,500 words across non-sensorimotor (e.g., valence, concreteness, imageability), sensory (e.g., visual, olfactory, auditory) and motor domains (e.g., foot/leg, mouth/throat) from the highly reliable and validated Glasgow Norms and Lancaster Norms datasets.The research team first compared pairs of data from individual humans and individual LLM runs to discover the similarity between word ratings across each dimension in the three domains, using results from human-human pairs as the benchmark. This approach could, for instance, highlight to what extent humans and LLMs agree that certain concepts are more concrete than others. However, such analyses might overlook how multiple dimensions jointly contribute to the overall representation of a word. For example, the word pair "pasta" and "roses" might receive equally high olfactory ratings, but "pasta" is in fact more similar to "noodles" than to "roses" when considering appearance and taste. The team therefore conducted representational similarity analysis of each word as a vector along multiple attributes of non-sensorimotor, sensory and motor dimensions for a more complete comparison between humans and LLMs.The representational similarity analyses revealed that word representations produced by the LLMs were most similar to human representations in the non-sensorimotor domain, less similar for words in sensory domain and most dissimilar for words in motor domain. This highlights LLM limitations in fully capturing humans' conceptual understanding. Non-sensorimotor concepts are understood well but LLMs fall short when representing concepts involving sensory information like visual appearance and taste, and body movement. Motor concepts, which are less described in language and rely heavily on embodied experiences, are even more challenging to LLMs than sensory concepts like colour, which can be learned from textual data.In light of the findings, the researchers examined whether grounding would improve the LLMs' performance. They compared the performance of more grounded LLMs trained on both language and visual input (GPT-4, Gemini) with that of LLMs trained on language alone (GPT-3.5, PaLM). They discovered that the more grounded models incorporating visual input exhibited a much higher similarity with human representations.Prof. Li Ping said, "The availability of both LLMs trained on language alone and those trained on language and visual input, such as images and videos, provides a unique setting for research on how sensory input affects human conceptualisation. Our study exemplifies the potential benefits of multimodal learning, a human ability to simultaneously integrate information from multiple dimensions in the learning and formation of concepts and knowledge in general. Incorporating multimodal information processing in LLMs can potentially lead to a more human-like representation and more efficient human-like performance in LLMs in the future."Interestingly, this finding is also consistent with those of previous human studies indicating the representational transfer. Humans acquire object-shape knowledge through both visual and tactile experiences, with seeing and touching objects activating the same regions in human brains. The researchers pointed out that – as in humans – multimodal LLMs may use multiple types of input to merge or transfer representations embedded in a continuous, high-dimensional space. Prof. Li added, "The smooth, continuous structure of embedding space in LLMs may underlie our observation that knowledge derived from one modality could transfer to other related modalities. This could explain why congenitally blind and normally sighted people can have similar representations in some areas. Current limits in LLMs are clear in this respect".Ultimately, the researchers envision a future in which LLMs are equipped with grounded sensory input, for example, through humanoid robotics, allowing them to actively interpret the physical world and act accordingly. Prof. Li said, "These advances may enable LLMs to fully capture embodied representations that mirror the complexity and richness of human cognition, and a rose in LLM's representation will then be indistinguishable from that of humans."Hashtag: #PolyU #HumanCognition #LargeLanguageModels #LLMs #GenerativeAI
The issuer is solely responsible for the content of this announcement.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis
PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

Malay Mail

time5 hours ago

  • Malay Mail

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking. The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch The issuer is solely responsible for the content of this announcement.

Huawei founder admits chips still trail US by ‘one generation'
Huawei founder admits chips still trail US by ‘one generation'

Malay Mail

time6 hours ago

  • Malay Mail

Huawei founder admits chips still trail US by ‘one generation'

BEIJING, June 10 — Chinese tech giant Huawei's chips still 'lag behind the United States by one generation', state media quoted its founder and CEO Ren Zhengfei as saying in a rare interview on Tuesday. Washington last month unveiled fresh guidelines warning firms that using Chinese-made high-tech AI semiconductors, specifically Huawei's Ascend chips, would put them at risk of violating US export controls. The Shenzhen-based company has been at the centre of an intense standoff between the economic supergiants after Washington warned its equipment could be used for espionage by Beijing, an allegation Huawei denies. Speaking to the People's Daily, the official newspaper of the ruling Communist Party, 80-year-old Ren insisted the United States had 'exaggerated' Huawei's achievements. Tougher controls in recent years have prevented US chip giant Nvidia, one of Huawei's rivals, from selling certain AI semiconductors — widely regarded as the most advanced in the world — to Chinese firms. As a result, it is now facing tougher competition from local players in the crucial market, including Huawei. Nvidia's chief executive Jensen Huang told reporters last month that Chinese companies 'are very, very talented and very determined, and the export control gave them the spirit, the energy and the government support to accelerate their development'. But Ren said Huawei was 'not that great yet', according to the article published on the newspaper's front page Tuesday. 'Many companies in China are making chips, and quite a few are doing well — Huawei is just one of them,' he added. When asked about 'external blockades and suppression' — a veiled reference to US export restrictions on Beijing — Ren said he had 'never thought about it'. 'Don't dwell on the difficulties, just get the job done and move forward step by step,' he added. Sanctions since 2019 have curtailed the firm's access to US-made components and technologies, forcing it to diversify its growth strategy. China has accused the United States of 'bullying' and 'abusing export controls to suppress and contain' the country's firms. — AFP

Aspire Becomes First Fintech to Integrate Directly with Payboy, Streamlining Payroll Management
Aspire Becomes First Fintech to Integrate Directly with Payboy, Streamlining Payroll Management

Malay Mail

time8 hours ago

  • Malay Mail

Aspire Becomes First Fintech to Integrate Directly with Payboy, Streamlining Payroll Management

SINGAPORE - Media OutReach Newswire - 10 June 2025 - Aspire , the all-in-one financial operating system for modern businesses, today announced its integration with Payboy , one of Asia's leading payroll software providers, serving over 70,000 users across the region. This integration marks Aspire as the first fintech company to directly integrate with Payboy, streamlining payroll operations for growing management is traditionally a cumbersome process. Businesses often spend valuable hours manually reformatting payroll files for bank transfers, increasing the risk of errors and operational disruptions. In fact, 52% of Singaporean HR professionals report spending up to 6 hours per week troubleshooting payroll errors, revising records, or querying data , time that could otherwise be directed towards strategic business growth and new integration directly addresses these challenges by enabling a seamless export of payroll data. Businesses can effortlessly export payroll files from Payboy and import them directly into Aspire without any additional manual reformatting. This streamlined process drastically reduces error rates and administrative workload, enabling payroll to be executed efficiently in just two simple steps: export and upload.'We are proud to be the first fintech to integrate with Payboy,' said Andrea Baronchelli, CEO and Co-Founder of Aspire. 'Aspire is leading the way in simplifying financial operations. We're not just offering faster payroll, we're redefining what modern, automated finance should look like for ambitious businesses.''This collaboration represents a shared vision of what business tools should be: intuitive, compliant, and built for growth,' said Raphael Ng, General Manager of Payboy. 'Together with Aspire, we're helping teams move faster with confidence, and enabling data driven decisions.'The partnership supports Aspire's broader vision of streamlining business finance through seamless integrations with trusted tools. The Payboy integration expands Aspire's suite of payroll integrations, reinforcing its commitment to automating and simplifying financial operations for Singaporean entrepreneurs and integration is now live and available to Aspire users in Singapore, create an account at The issuer is solely responsible for the content of this announcement. Aspire Aspire is the all-in-one finance platform for modern businesses globally, helping over 50,000 companies save time and money with international payments, treasury, expense, payable, and receivable management solutions - accessible via a single, user-friendly account. Headquartered in Singapore, Aspire has 600+ employees across nine countries, clients in 30+ markets and is backed by global top tier VCs, including Sequoia, Lightspeed, Y-Combinator, Tencent and Paypal. In 2023, Aspire closed an oversubscribed US$100M Series C round and announced that it has achieved profitability.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store