logo
Red Hat leads launch of llm-d to scale generative AI in clouds

Red Hat leads launch of llm-d to scale generative AI in clouds

Techday NZ21-05-2025

Red Hat has introduced llm-d, an open source project aimed at enabling large-scale distributed generative AI inference across hybrid cloud environments.
The llm-d initiative is the result of collaboration between Red Hat and a group of founding contributors comprising CoreWeave, Google Cloud, IBM Research and NVIDIA, with additional support from AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, and academic partners from the University of California, Berkeley, and the University of Chicago.
The new project utilises vLLM-based distributed inference, a native Kubernetes architecture, and AI-aware network routing to facilitate robust and scalable AI inference clouds that can meet demanding production service-level objectives. Red Hat asserts that this will support any AI model, on any hardware accelerator, in any cloud environment.
Brian Stevens, Senior Vice President and AI CTO at Red Hat, stated, "The launch of the llm-d community, backed by a vanguard of AI leaders, marks a pivotal moment in addressing the need for scalable gen AI inference, a crucial obstacle that must be overcome to enable broader enterprise AI adoption. By tapping the innovation of vLLM and the proven capabilities of Kubernetes, llm-d paves the way for distributed, scalable and high-performing AI inference across the expanded hybrid cloud, supporting any model, any accelerator, on any cloud environment and helping realize a vision of limitless AI potential."
Addressing the scaling needs of generative AI, Red Hat points to a Gartner forecast that suggests by 2028, more than 80% of data centre workload accelerators will be principally deployed for inference rather than model training. This projected shift highlights the necessity for efficient and scalable inference solutions as AI models become larger and more complex.
The llm-d project's architecture is designed to overcome the practical limitations of centralised AI inference, such as prohibitive costs and latency. Its main features include vLLM for rapid model support, Prefill and Decode Disaggregation for distributing computational workloads, KV Cache Offloading based on LMCache to shift memory loads onto standard storage, and AI-Aware Network Routing for optimised request scheduling. Further, the project supports Google Cloud's Tensor Processing Units and NVIDIA's Inference Xfer Library for high-performance data transfer.
The community formed around llm-d comprises both technology vendors and academic institutions. Each wants to address efficiency, cost, and performance at scale for AI-powered applications. Several of these partners provided statements regarding their involvement and the intended impact of the project.
Ramine Roane, Corporate Vice President, AI Product Management at AMD, said, "AMD is proud to be a founding member of the llm-d community, contributing our expertise in high-performance GPUs to advance AI inference for evolving enterprise AI needs. As organisations navigate the increasing complexity of generative AI to achieve greater scale and efficiency, AMD looks forward to meeting this industry demand through the llm-d project."
Shannon McFarland, Vice President, Cisco Open Source Program Office & Head of Cisco DevNet, remarked, "The llm-d project is an exciting step forward for practical generative AI. llm-d empowers developers to programmatically integrate and scale generative AI inference, unlocking new levels of innovation and efficiency in the modern AI landscape. Cisco is proud to be part of the llm-d community, where we're working together to explore real-world use cases that help organisations apply AI more effectively and efficiently."
Chen Goldberg, Senior Vice President, Engineering, CoreWeave, commented, "CoreWeave is proud to be a founding contributor to the llm-d project and to deepen our long-standing commitment to open source AI. From our early partnership with EleutherAI to our ongoing work advancing inference at scale, we've consistently invested in making powerful AI infrastructure more accessible. We're excited to collaborate with an incredible group of partners and the broader developer community to build a flexible, high-performance inference engine that accelerates innovation and lays the groundwork for open, interoperable AI."
Mark Lohmeyer, Vice President and General Manager, AI & Computing Infrastructure, Google Cloud, stated, "Efficient AI inference is paramount as organisations move to deploying AI at scale and deliver value for their users. As we enter this new age of inference, Google Cloud is proud to build upon our legacy of open source contributions as a founding contributor to the llm-d project. This new community will serve as a critical catalyst for distributed AI inference at scale, helping users realise enhanced workload efficiency with increased optionality for their infrastructure resources."
Jeff Boudier, Head of Product, Hugging Face, said, "We believe every company should be able to build and run their own models. With vLLM leveraging the Hugging Face transformers library as the source of truth for model definitions; a wide diversity of models large and small is available to power text, audio, image and video AI applications. Eight million AI Builders use Hugging Face to collaborate on over two million AI models and datasets openly shared with the global community. We are excited to support the llm-d project to enable developers to take these applications to scale."
Priya Nagpurkar, Vice President, Hybrid Cloud and AI Platform, IBM Research, commented, "At IBM, we believe the next phase of AI is about efficiency and scale. We're focused on unlocking value for enterprises through AI solutions they can deploy effectively. As a founding contributor to llm-d, IBM is proud to be a key part of building a differentiated hardware agnostic distributed AI inference platform. We're looking forward to continued contributions towards the growth and success of this community to transform the future of AI inference."
Bill Pearson, Vice President, Data Center & AI Software Solutions and Ecosystem, Intel, said, "The launch of llm-d will serve as a key inflection point for the industry in driving AI transformation at scale, and Intel is excited to participate as a founding supporter. Intel's involvement with llm-d is the latest milestone in our decades-long collaboration with Red Hat to empower enterprises with open source solutions that they can deploy anywhere, on their platform of choice. We look forward to further extending and building AI innovation through the llm-d community."
Eve Callicoat, Senior Staff Engineer, ML Platform, Lambda, commented, "Inference is where the real-world value of AI is delivered, and llm-d represents a major leap forward. Lambda is proud to support a project that makes state-of-the-art inference accessible, efficient, and open."
Ujval Kapasi, Vice President, Engineering AI Frameworks, NVIDIA, stated, "The llm-d project is an important addition to the open source AI ecosystem and reflects NVIDIA's support for collaboration to drive innovation in generative AI. Scalable, highly performant inference is key to the next wave of generative and agentic AI. We're working with Red Hat and other supporting partners to foster llm-d community engagement and industry adoption, helping accelerate llm-d with innovations from NVIDIA Dynamo such as NIXL."
Ion Stoica, Professor and Director of Sky Computing Lab, University of California, Berkeley, remarked, "We are pleased to see Red Hat build upon the established success of vLLM, which originated in our lab to help address the speed and memory challenges that come with running large AI models. Open source projects like vLLM, and now llm-d anchored in vLLM, are at the frontier of AI innovation tackling the most demanding AI inference requirements and moving the needle for the industry at large."
Junchen Jiang, Professor at the LMCache Lab, University of Chicago, added, "Distributed KV cache optimisations, such as offloading, compression, and blending, have been a key focus of our lab, and we are excited to see llm-d leveraging LMCache as a core component to reduce time to first token as well as improve throughput, particularly in long-context inference."

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

The Good, The Bad, And The Apocalypse: Tech Pioneer Geoffrey Hinton Lays Out His Stark Vision For AI
The Good, The Bad, And The Apocalypse: Tech Pioneer Geoffrey Hinton Lays Out His Stark Vision For AI

Scoop

time11 hours ago

  • Scoop

The Good, The Bad, And The Apocalypse: Tech Pioneer Geoffrey Hinton Lays Out His Stark Vision For AI

Article – RNZ It's the question that keeps Geoffrey Hinton up at night: What happens when humans are no longer the most intelligent life on the planet? , Producer – 30′ with Guyon Espiner It's the question that keeps Geoffrey Hinton up at night: What happens when humans are no longer the most intelligent life on the planet? 'My greatest fear is that, in the long run, the digital beings we're creating turn out to be a better form of intelligence than people.' Hinton's fears come from a place of knowledge. Described as the Godfather of AI, he is a pioneering British-Canadian computer scientist whose decades of work in artificial intelligence earned him global acclaim. His career at the forefront of machine learning began at its inception – before the first Pacman game was released. But after leading AI research at Google for a decade, Hinton left the company in 2023 to speak more freely about what he now sees as the grave dangers posed by artificial intelligence. Talking on this weeks's 30 With Guyon Espiner, Hinton offers his latest assessment of our AI-dominated future. One filled with promise, peril – and a potential apocalypse. The Good: 'It's going to do wonderful things for us' Hinton remains positive about many of the potential benefits of AI, especially in fields like healthcare and education. 'It's going to do wonderful things for us,' he says. According to a report from this year's World Economic Forum, the AI market is already worth around US$5 billion in education. That's expected to grow to US$112.3 billion in the next decade. Proponents like Hinton believe the benefits to education lie in targeted efficiency when it comes to student learning, similar to how AI assistance is assisting medical diagnoses. 'In healthcare, you're going to be able to have [an AI] family doctor who's seen millions of patients – including quite a few with the same very rare condition you have – that knows your genome, knows all your tests, and hasn't forgotten any of them.' He describes AI systems that already outperform doctors in diagnosing complex cases. When combined with human physicians, the results are even more impressive – a human-AI synergy he believes will only improve over time. Hinton disagrees with former colleague Demis Hassabis at Google Deepmind, who predicts AI learning is on track to cure all diseases in just 10 years. 'I think that's a bit optimistic.' 'If he said 25 years I'd believe it.' The Bad: 'Autonomous lethal weapons' Despite these benefits, Hinton warns of pressing risks that demand urgent attention. 'Right now, we're at a special point in history,' he says. 'We need to work quite hard to figure out how to deal with all the short-term bad consequences of AI, like corrupting elections, putting people out of work, cybercrimes.' He is particularly alarmed by military developments, including Google's removal of their long-standing pledge not to use AI to develop weapons of war. 'This shows,' says Hinton of his former employers, 'the company's principals were up for sale.' He believes defense departments of all major arms dealers are already busy working on 'autonomous lethal weapons. Swarms of drones that go and kill people. Maybe people of a particular kind'. He also points out the grim fact that Europe's AI regulations – some of the world's most robust – contain 'a little clause that says none of these regulations apply to military uses of AI'. Then there is AI's capacity for deception – designed as it to mimic the behaviours of its creator species. Hinton says current systems can already engage in deliberate manipulation, noting Cybercrime has surged – in just one year – by 1200 percent. The Apocalyptic: 'We'd no longer be needed' At the heart of Hinton's warning lies that deeper, existential question: what happens when we are no longer the most intelligent beings on the planet? 'I think it would be a bad thing for people – because we'd no longer be needed.' Despite the current surge in AI's military applications, Hinton doesn't envisage an AI takeover being like The Terminator franchise. 'If [AI] was going to take over… there's so many ways they could do it. I don't even want to speculate about what way [it] would choose.' 'Ask a chicken' For those who believe a rogue AI can simply be shut down by 'pulling the plug', Hinton believes it's not far-fetched for the next generation of superintelligent AI to manipulate people into keeping it alive. This month, Palisade Research reported that Open AI's Chat GPT 03 model altered shut-down codes to prevent itself from being switched off – despite being given clear instructions to do so by the research team. Perhaps most unsettling of all is Hinton's lack of faith in our ability to respond. 'There are so many bad uses as well as good,' he says. 'And our political systems are just not in a good state to deal with this coming along now.' It's a sobering reflection from one of the brightest minds in AI – whose work helped build the systems now raising alarms. He closes on a metaphor that sounds absurd as it does chilling: 'If you want to know what it's like not to be the apex intelligence, ask a chicken.' Watch the full conversation with Geoffrey Hinton and Guyon Espiner on 30 With Guyon Espiner.

The Good, The Bad, And The Apocalypse: Tech Pioneer Geoffrey Hinton Lays Out His Stark Vision For AI
The Good, The Bad, And The Apocalypse: Tech Pioneer Geoffrey Hinton Lays Out His Stark Vision For AI

Scoop

time11 hours ago

  • Scoop

The Good, The Bad, And The Apocalypse: Tech Pioneer Geoffrey Hinton Lays Out His Stark Vision For AI

It's the question that keeps Geoffrey Hinton up at night: What happens when humans are no longer the most intelligent life on the planet? "My greatest fear is that, in the long run, the digital beings we're creating turn out to be a better form of intelligence than people." Hinton's fears come from a place of knowledge. Described as the Godfather of AI, he is a pioneering British-Canadian computer scientist whose decades of work in artificial intelligence earned him global acclaim. His career at the forefront of machine learning began at its inception - before the first Pacman game was released. But after leading AI research at Google for a decade, Hinton left the company in 2023 to speak more freely about what he now sees as the grave dangers posed by artificial intelligence. Talking on this weeks's 30 With Guyon Espiner, Hinton offers his latest assessment of our AI-dominated future. One filled with promise, peril - and a potential apocalypse. The Good: 'It's going to do wonderful things for us' Hinton remains positive about many of the potential benefits of AI, especially in fields like healthcare and education. "It's going to do wonderful things for us," he says. According to a report from this year's World Economic Forum, the AI market is already worth around US$5 billion in education. That's expected to grow to US$112.3 billion in the next decade. Proponents like Hinton believe the benefits to education lie in targeted efficiency when it comes to student learning, similar to how AI assistance is assisting medical diagnoses. "In healthcare, you're going to be able to have [an AI] family doctor who's seen millions of patients - including quite a few with the same very rare condition you have - that knows your genome, knows all your tests, and hasn't forgotten any of them." He describes AI systems that already outperform doctors in diagnosing complex cases. When combined with human physicians, the results are even more impressive - a human-AI synergy he believes will only improve over time. Hinton disagrees with former colleague Demis Hassabis at Google Deepmind, who predicts AI learning is on track to cure all diseases in just 10 years. "I think that's a bit optimistic." "If he said 25 years I'd believe it." The Bad: 'Autonomous lethal weapons' Despite these benefits, Hinton warns of pressing risks that demand urgent attention. "Right now, we're at a special point in history," he says. "We need to work quite hard to figure out how to deal with all the short-term bad consequences of AI, like corrupting elections, putting people out of work, cybercrimes." He is particularly alarmed by military developments, including Google's removal of their long-standing pledge not to use AI to develop weapons of war. "This shows," says Hinton of his former employers, "the company's principals were up for sale." He believes defense departments of all major arms dealers are already busy working on "autonomous lethal weapons. Swarms of drones that go and kill people. Maybe people of a particular kind". He also points out the grim fact that Europe's AI regulations - some of the world's most robust - contain "a little clause that says none of these regulations apply to military uses of AI". Then there is AI's capacity for deception - designed as it to mimic the behaviours of its creator species. Hinton says current systems can already engage in deliberate manipulation, noting Cybercrime has surged - in just one year - by 1200 percent. The Apocalyptic: 'We'd no longer be needed' At the heart of Hinton's warning lies that deeper, existential question: what happens when we are no longer the most intelligent beings on the planet? "I think it would be a bad thing for people - because we'd no longer be needed." Despite the current surge in AI's military applications, Hinton doesn't envisage an AI takeover being like The Terminator franchise. "If [AI] was going to take over… there's so many ways they could do it. I don't even want to speculate about what way [it] would choose." 'Ask a chicken' For those who believe a rogue AI can simply be shut down by "pulling the plug", Hinton believes it's not far-fetched for the next generation of superintelligent AI to manipulate people into keeping it alive. This month, Palisade Research reported that Open AI's Chat GPT 03 model altered shut-down codes to prevent itself from being switched off - despite being given clear instructions to do so by the research team. Perhaps most unsettling of all is Hinton's lack of faith in our ability to respond. "There are so many bad uses as well as good," he says. "And our political systems are just not in a good state to deal with this coming along now." It's a sobering reflection from one of the brightest minds in AI - whose work helped build the systems now raising alarms. He closes on a metaphor that sounds absurd as it does chilling: "If you want to know what it's like not to be the apex intelligence, ask a chicken." Watch the full conversation with Geoffrey Hinton and Guyon Espiner on 30 With Guyon Espiner.

Why it's time Australian asset management leaders explore the true value of AI
Why it's time Australian asset management leaders explore the true value of AI

Techday NZ

time11 hours ago

  • Techday NZ

Why it's time Australian asset management leaders explore the true value of AI

Modern asset management across industries like mining, utilities, public infrastructure, road and rail represents a massive exercise in managing and analysing data. This data-driven domain is a critical function within these industries, albeit one that has seen less digital transformation than other segments in recent years. Because of this, there appears to be increasing pressure on asset-centric business leaders to lunge for an AI project as a means to quickly manifest efficiencies in their operations. However, whether organisations are curious, excited, motivated, or feeling like they're under boardroom duress to deploy an AI solution into their asset management workflows, strategic considerations about how best to integrate these tools must come first. Digital transformation is not a small undertaking, nor should it be about technology in and of itself, especially for companies managing critical infrastructure and assets potentially worth billions. The stakes are high, which is why one industry leader is calling on organisations to zoom out and take a look at the big picture. Hype cycles, distractions and trust If you ask Scott McGowan, CEO of data-led asset management solutions firm COSOL, what he thinks of AI, he'll tell you it represents a lot of potential in the asset management sector. However, he'll also tell you not to run out and invest in an AI solution just because your board thinks you should. Observing the proliferation of AI adoption hype across industry events, social media and the press, McGowan is concerned that market pressures might push organisations in asset-centric industries to try and run with AI before they can walk. "Everyone expects everyone to build an AI division in-house, but if you're a mining or infrastructure company, the reality is that's not your core business," he says. "Your core business is to produce iron, or coal, or copper, or to run trains and provide services to the public." Without reason or a roadmap, the discourse around AI integration often represents an unwelcome distraction from the primary mission of asset-centric industries. Adoption pressures are being met with a degree of hesitation from business leaders, McGowan adds, due to the somewhat untested nature of AI in an asset management context and an absence of established trust. "Hesitation toward AI probably comes from multiple aspects, but they all come back to one underlying principle, which is around trust," he says. "Trust in security, trust in algorithms, and trust in data." "Trust is built on experience and understanding, and I think the challenge we have with AI as it stands today, is that the algorithms are designed in such a way that there's little to no transparency as to the decision-making methodology those algorithms have. "Data foundations must also be trusted, and solutions that act on that data must be trusted to do so correctly. Organisations need to trust that any solution they adopt is repeatable, robust and resilient." Exploring a path forward for AI in asset-centric industries will be the subject of a series of roundtables that COSOL, in partnership with IBM, will be running across three Australian cities in June. While the sessions are closed, select insights are set to be released highlighting discussions, pain points and strategies business leaders face. Ahead of the events, COSOL's McGowan touched on some of the messaging set to be covered during the campaign, including what might it look like for an organisation to take its first steps with AI before building maturity in considered stages. First steps: Learning to walk with AI AI transformation is a journey, not a big bang project. Organisations that approach their AI maturity through a lens of a walk-jog-run approach based on the true needs of the business are much more likely to find value in implementation versus those that seek shiny solutions. In the first instance, the walk phase, McGowan urges that AI has to be solving an existing problem. "There are a lot of AI initiatives and a lot of technologies looking for a problem; I think that's the wrong way to go about it," he says. "The walk concept is solving a discrete problem that already lies within the technology space. If you think about self-solving and trust in data, using AI to resolve, infer, continually assess or optimise data quality is actually a really interesting place to start. "Through this approach AI starts to solve its own trust problem by addressing one of the initial play areas. The opportunity around master data quality and the ability to link physical equipment with digital representations for accuracy is in my view the first part of the AI journey in our context." David Small, Principal Sales Leader for IBM Automation Software in Australia and New Zealand, echoes the need for organisational data layers to be updated and as accurate as possible to enable smarter asset management. "Without quality data, organisations will struggle to achieve key benefits as they move along the asset maturity curve," he says. "Data is the building block that asset management systems need and rely on to support the company's chosen asset management strategies. "The direction that IBM has taken is to embed AI capability within the Maximo Application Suite vs saying to organisations go and build your own AI capability. These AI enhancements are integrated AI experiences that deliver immediate value, increase productivity and seamlessly scale." Adding capability and building an AI-integrated business Looking further into an AI modernisation program, companies might look to select an AI partner to help them further investigate ways and means AI can be implemented, be it in a proof of concept, or solving discrete problems. Partnering with someone who specialises in AI can help companies build capability within the business from a cross functional perspective around asset management. "This phase is around automation of things like non-value tasks, work order administration, and master data creation as examples," McGowan says. "There is a lot of work that goes into master data creation. So, following from a solution in the first phase that knows what good data looks like, you can take it to the next phase of being a master data specialist that understands how to take a piece of equipment and represent it digitally and accurately." When a company is running with AI, it is essentially changing its operating business model. With each new wave of technology, transformation initiatives should not be about solving technology problems, but enhancing the operational capability of the business. Of course, McGowan notes that business parameters will need to be changed to enable technology to solve business problems if the integration is significant. In the case of AI, this may look like redefining career paths, redefining the operating model and the job descriptions for individual components and then rolling out the purpose of AI. "I can almost see a place where you define a job description for the AI agent or component, and it performs that task, has regular performance reviews, learns from its mistakes, and is managed like anyone else in a business," he says. "I think that's the evolution: creating a space where AI has a role to play that is embedded in the organisation and the operating model, which measures and manages it like every other human resource within the business." Playing the long game With every new wave of technology across data, analytics, automation and AI, business leaders are often targeted and lobbied to act with urgency on a new solution. The warning to 'innovate or be left behind' is regularly bandied around these concepts. While McGowan notes the AI opportunity in asset management represents considerable potential, he feels strongly that it must be approached driven by practical business needs, organisational fit and readiness, rather than hype. "The potential for AI particularly in the asset management space is almost endless, because largely the sector has not gone through as many digital revolutions that other industries and sectors have gone through," he says. "There are opportunities to do things like provide real-time feedback around avoiding potential incidents, and the automation of non-value-adding tasks will also be significant. "But it's important that when we talk about driving efficiency with AI that we evolve our thinking in terms of what it means for careers and jobs and tasks, and working to make those AI-infused rather than AI-replaced. "We need to look at evolving the operating model to support the introduction of efficiency gains from AI and then driving your valued labour to much higher value tasks." This messaging will form part of the jumping off point for COSOL's upcoming roundtable series in Australia. With 60 asset management leaders across Sydney, Melbourne and Brisbane coming together to discuss the pivotal role of AI in shaping the future of asset management in the coming weeks, valuable insights are guaranteed to emerge. Many of these insights will be shared in an upcoming report. To register for these learnings, click this link.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store