logo
PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

Malay Mail8 hours ago

A research team led by Prof. Li Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, explored the similarities between large language models and human representations, shedding new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge.
HONG KONG SAR - Media OutReach Newswire - 9 June 2025 - Can one truly understand what "flower" means without smelling a rose, touching a daisy or walking through a field of wildflowers? This question is at the core of a rich debate in philosophy and cognitive science. While embodied cognition theorists argue that physical, sensory experience is essential to concept formation, studies of the rapidly evolving large language models (LLMs) suggest that language alone can build deep, meaningful representations of the world.By exploring the similarities between LLMs and human representations, researchers at The Hong Kong Polytechnic University (PolyU) and their collaborators have shed new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge. Their findings also revealed how the use of sensory input for grounding or embodiment – connecting abstract with concrete concepts during learning – affects the ability of LLMs to understand complex concepts and form human-like representations. The study, in collaboration with scholars from Ohio State University, Princeton University and City University of New York, was recently published in Nature Human Behaviour Led by Prof. LI Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, the research team selected conceptual word ratings produced by state-of-the-art LLMs, namely ChatGPT (GPT-3.5, GPT-4) and Google LLMs (PaLM and Gemini). They compared them with human-generated word ratings of around 4,500 words across non-sensorimotor (e.g., valence, concreteness, imageability), sensory (e.g., visual, olfactory, auditory) and motor domains (e.g., foot/leg, mouth/throat) from the highly reliable and validated Glasgow Norms and Lancaster Norms datasets.The research team first compared pairs of data from individual humans and individual LLM runs to discover the similarity between word ratings across each dimension in the three domains, using results from human-human pairs as the benchmark. This approach could, for instance, highlight to what extent humans and LLMs agree that certain concepts are more concrete than others. However, such analyses might overlook how multiple dimensions jointly contribute to the overall representation of a word. For example, the word pair "pasta" and "roses" might receive equally high olfactory ratings, but "pasta" is in fact more similar to "noodles" than to "roses" when considering appearance and taste. The team therefore conducted representational similarity analysis of each word as a vector along multiple attributes of non-sensorimotor, sensory and motor dimensions for a more complete comparison between humans and LLMs.The representational similarity analyses revealed that word representations produced by the LLMs were most similar to human representations in the non-sensorimotor domain, less similar for words in sensory domain and most dissimilar for words in motor domain. This highlights LLM limitations in fully capturing humans' conceptual understanding. Non-sensorimotor concepts are understood well but LLMs fall short when representing concepts involving sensory information like visual appearance and taste, and body movement. Motor concepts, which are less described in language and rely heavily on embodied experiences, are even more challenging to LLMs than sensory concepts like colour, which can be learned from textual data.In light of the findings, the researchers examined whether grounding would improve the LLMs' performance. They compared the performance of more grounded LLMs trained on both language and visual input (GPT-4, Gemini) with that of LLMs trained on language alone (GPT-3.5, PaLM). They discovered that the more grounded models incorporating visual input exhibited a much higher similarity with human representations.Prof. Li Ping said, "The availability of both LLMs trained on language alone and those trained on language and visual input, such as images and videos, provides a unique setting for research on how sensory input affects human conceptualisation. Our study exemplifies the potential benefits of multimodal learning, a human ability to simultaneously integrate information from multiple dimensions in the learning and formation of concepts and knowledge in general. Incorporating multimodal information processing in LLMs can potentially lead to a more human-like representation and more efficient human-like performance in LLMs in the future."Interestingly, this finding is also consistent with those of previous human studies indicating the representational transfer. Humans acquire object-shape knowledge through both visual and tactile experiences, with seeing and touching objects activating the same regions in human brains. The researchers pointed out that – as in humans – multimodal LLMs may use multiple types of input to merge or transfer representations embedded in a continuous, high-dimensional space. Prof. Li added, "The smooth, continuous structure of embedding space in LLMs may underlie our observation that knowledge derived from one modality could transfer to other related modalities. This could explain why congenitally blind and normally sighted people can have similar representations in some areas. Current limits in LLMs are clear in this respect".Ultimately, the researchers envision a future in which LLMs are equipped with grounded sensory input, for example, through humanoid robotics, allowing them to actively interpret the physical world and act accordingly. Prof. Li said, "These advances may enable LLMs to fully capture embodied representations that mirror the complexity and richness of human cognition, and a rose in LLM's representation will then be indistinguishable from that of humans."Hashtag: #PolyU #HumanCognition #LargeLanguageModels #LLMs #GenerativeAI
The issuer is solely responsible for the content of this announcement.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

BAIC Announces 2025 Vehicle Lineup for International Automotive &Supply Chain EXPO (HONG KONG)
BAIC Announces 2025 Vehicle Lineup for International Automotive &Supply Chain EXPO (HONG KONG)

Malay Mail

time40 minutes ago

  • Malay Mail

BAIC Announces 2025 Vehicle Lineup for International Automotive &Supply Chain EXPO (HONG KONG)

HONG KONG SAR - Media OutReach Newswire - 9 June 2025 - From June 12 to 15, the INTERNATIONAL AUTOMOTIVE&SUPPLY CHAIN EXPO in Hong Kong will host a major international auto show under the theme "New Car, New Journey." The event will bring automakers, experts and media together showcasing future mobility innovation and the upcoming exhibition, BAIC Group will present both passenger and commercial lineup, such as the BJ40PLUS RHD and X55 II RHD. BEV models include ARCFOX αS5, ARCFOX αT5 and STELATO S9(REEV).Internationalization is one of BAIC Group's core the general platform of internationalization, BAIC INTL comprehensively coordinates the overseas business. Since its established in 2013, BAIC INTL has expanded to more than 50 countries, establish nearly 300 sales network , and set up eight KD plants—including one wholly owned overseas production base. Its portfolio consists of three distinct brands: BAIC, focused on mainstream and off-road models; ARCFOX, a high-end NEV brand expanding from electric to hybrid offerings; and STELATO, targeting the luxury new energy the show, BAIC will unveil a range of advanced technologies, including its self-developed Polaris EV platform and the world's first AI-integrated cockpit system. Driven by artificial intelligence, the company is building a closed-loop innovation system that connects user insight, product development, and after-sales service—enabling continuous product evolution and a more intelligent mobility June 12 to 15, visit Booth C02, Hall 6 at AsiaWorld-Expo to discover how BAIC is driving the next chapter of global mobility—powered by technology, guided by #BAIC The issuer is solely responsible for the content of this announcement.

VinUni sets its sights on world's top 100 universities – Recruiting 500 leading scholars worldwide
VinUni sets its sights on world's top 100 universities – Recruiting 500 leading scholars worldwide

Malay Mail

time4 hours ago

  • Malay Mail

VinUni sets its sights on world's top 100 universities – Recruiting 500 leading scholars worldwide

VinUni fosters a multicultural, globally oriented learning environment. VinUni's cutting-edge laboratories are purpose-built to enhance both education and scientific research. HANOI, VIETNAM - Media OutReach Newswire - 9 June 2025 - VinUniversity (VinUni) has officially unveiled its second-phase development strategy, with the ambitious vision of becoming one of the world's top 100 universities. At the heart of this initiative is the "VinUni 500" program, which aims to attract 500 outstanding scholars from around the globe. The strategy is backed by a USD 372 million investment from Vingroup to scale up academic capabilities and cultivate a world-class research ecosystem in phase 2 strategy is built on two core pillars: enhancing academic and research infrastructure to international standards and strengthening academic capacity to meet global benchmarks. The University will focus on five strategic research domains:In terms of infrastructure, Vingroup has pledged up to USD 372 million in funding for this new phase. Of this, approximately USD 60 million will be allocated to expanding the campus, including new auditoriums, dormitories, sports facilities, and cutting-edge laboratories. Student enrollment will increase from 1,500 to 5,000, nearly half of whom will be postgraduate students—laying the foundation for VinUni's evolving research-intensive environment. A highlight of this expansion is the establishment of the VinUni Industry, Innovation and Research Complex on campus— an integrated research center that will house advanced technology laboratories, incubation for university-industry partnership, and a 1,000-seat international conference center. The facility is expected to regularly host over 600 this complex, R&D professionals from industry, including subsidiaries of Vingroupsuch as VinFast, Vinmec, VinBigData, VinRobotics, and VinMotion—will collaborate with VinUni faculty members and students in co-innovation studios. Together, they will develop high-impact, application-oriented research with strong commercialization potential, fueled by an entrepreneurial parallel, VinUni will strengthen international academic partnerships with leading institutions such as Cornell University (USA), the University of Pennsylvania (USA), and Nanyang Technological University (Singapore), as well as with leading Vietnamese universities. These partnerships will facilitate the formation of interdisciplinary RISE (Research, Innovation, Sustainability, and Excellence) research clusters, aimed at creating high-impact scholarly publications and tackling pressing issues at both the local and global strengthen academic capacity, the institution is launching theprogram to recruit 500 top-tier scholars and researchers, including: 10 high-caliber academic leaders, 200 talented research-focused faculties, 200 outstanding early-career researchers, and 100 affiliated attract these scholars, VinUni will offer globally competitive remuneration and support packages. For instance, all faculty will receive a personal development grant of up to approximately USD 6,000 annually, and outstanding researchers may access seed funding of up to approximately USD 230, forging research and teaching excellence, "VinUni 500" serves as a bridge connecting the Vietnamese academics with international scholars as well as world-class scientists from the VinFuture Prize network—fostering impactful, cross-border research collaborations., President of The University Council, VinUniversity, shared:Founded with the mission to build a world-class university in Vietnam, VinUni has, in just five years, emerged as a symbol of entrepreneurial spirit, agility, academic excellence, and global connectivity. It is the youngest university worldwide to achieve a QS 5-Star overall rating, affirming its accelerating progress in the global higher education information on "VinUni 500": Hashtag: #VinUniversity The issuer is solely responsible for the content of this announcement.

PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts
PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

The Sun

time5 hours ago

  • The Sun

PolyU-led research reveals that sensory and motor inputs help large language models represent complex concepts

HONG KONG SAR - Media OutReach Newswire - 9 June 2025 - Can one truly understand what 'flower' means without smelling a rose, touching a daisy or walking through a field of wildflowers? This question is at the core of a rich debate in philosophy and cognitive science. While embodied cognition theorists argue that physical, sensory experience is essential to concept formation, studies of the rapidly evolving large language models (LLMs) suggest that language alone can build deep, meaningful representations of the world. By exploring the similarities between LLMs and human representations, researchers at The Hong Kong Polytechnic University (PolyU) and their collaborators have shed new light on the extent to which language alone can shape the formation and learning of complex conceptual knowledge. Their findings also revealed how the use of sensory input for grounding or embodiment – connecting abstract with concrete concepts during learning – affects the ability of LLMs to understand complex concepts and form human-like representations. The study, in collaboration with scholars from Ohio State University, Princeton University and City University of New York, was recently published in Nature Human Behaviour. Led by Prof. LI Ping, Sin Wai Kin Foundation Professor in Humanities and Technology, Dean of the PolyU Faculty of Humanities and Associate Director of the PolyU-Hangzhou Technology and Innovation Research Institute, the research team selected conceptual word ratings produced by state-of-the-art LLMs, namely ChatGPT (GPT-3.5, GPT-4) and Google LLMs (PaLM and Gemini). They compared them with human-generated word ratings of around 4,500 words across non-sensorimotor (e.g., valence, concreteness, imageability), sensory (e.g., visual, olfactory, auditory) and motor domains (e.g., foot/leg, mouth/throat) from the highly reliable and validated Glasgow Norms and Lancaster Norms datasets. The research team first compared pairs of data from individual humans and individual LLM runs to discover the similarity between word ratings across each dimension in the three domains, using results from human-human pairs as the benchmark. This approach could, for instance, highlight to what extent humans and LLMs agree that certain concepts are more concrete than others. However, such analyses might overlook how multiple dimensions jointly contribute to the overall representation of a word. For example, the word pair 'pasta' and 'roses' might receive equally high olfactory ratings, but 'pasta' is in fact more similar to 'noodles' than to 'roses' when considering appearance and taste. The team therefore conducted representational similarity analysis of each word as a vector along multiple attributes of non-sensorimotor, sensory and motor dimensions for a more complete comparison between humans and LLMs. The representational similarity analyses revealed that word representations produced by the LLMs were most similar to human representations in the non-sensorimotor domain, less similar for words in sensory domain and most dissimilar for words in motor domain. This highlights LLM limitations in fully capturing humans' conceptual understanding. Non-sensorimotor concepts are understood well but LLMs fall short when representing concepts involving sensory information like visual appearance and taste, and body movement. Motor concepts, which are less described in language and rely heavily on embodied experiences, are even more challenging to LLMs than sensory concepts like colour, which can be learned from textual data. In light of the findings, the researchers examined whether grounding would improve the LLMs' performance. They compared the performance of more grounded LLMs trained on both language and visual input (GPT-4, Gemini) with that of LLMs trained on language alone (GPT-3.5, PaLM). They discovered that the more grounded models incorporating visual input exhibited a much higher similarity with human representations. Prof. Li Ping said, 'The availability of both LLMs trained on language alone and those trained on language and visual input, such as images and videos, provides a unique setting for research on how sensory input affects human conceptualisation. Our study exemplifies the potential benefits of multimodal learning, a human ability to simultaneously integrate information from multiple dimensions in the learning and formation of concepts and knowledge in general. Incorporating multimodal information processing in LLMs can potentially lead to a more human-like representation and more efficient human-like performance in LLMs in the future.' Interestingly, this finding is also consistent with those of previous human studies indicating the representational transfer. Humans acquire object-shape knowledge through both visual and tactile experiences, with seeing and touching objects activating the same regions in human brains. The researchers pointed out that – as in humans – multimodal LLMs may use multiple types of input to merge or transfer representations embedded in a continuous, high-dimensional space. Prof. Li added, 'The smooth, continuous structure of embedding space in LLMs may underlie our observation that knowledge derived from one modality could transfer to other related modalities. This could explain why congenitally blind and normally sighted people can have similar representations in some areas. Current limits in LLMs are clear in this respect'. Ultimately, the researchers envision a future in which LLMs are equipped with grounded sensory input, for example, through humanoid robotics, allowing them to actively interpret the physical world and act accordingly. Prof. Li said, 'These advances may enable LLMs to fully capture embodied representations that mirror the complexity and richness of human cognition, and a rose in LLM's representation will then be indistinguishable from that of humans.'

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store