Latest news with #vLLM

Mid East Info

25-05-2025

Business
Mid East Info

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat AI Inference Server, powered by vLLM and enhanced with Neural Magic technologies, delivers faster, higher-performing and more cost-efficient AI inference across the hybrid cloud BOSTON – RED HAT SUMMIT – MAY, 2025 — Red Hat, the world's leading provider of open source solutions, announced Red Hat AI Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat's integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat's vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently deploy and scale gen AI in production. Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It's the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease. Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments. vLLM: Extending inference innovation: Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more. vLLM's broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI's future. Introducing Red Hat AI Inference Server: Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerized offering or as part of both RHEL AI and Red Hat OpenShift AI. Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with: Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimizing compute consumption while preserving and potentially enhancing model accuracy. Optimized model repository, hosted in the Red Hat AI organization on Hugging Face, offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy. Red Hat's enterprise support and decades of expertise in bringing community projects to production environments. Third-party support for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat's third-party support policy. Red Hat's vision: Any model, any accelerator, any cloud. The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform – a standard for more seamless, high-performance AI innovation, both today and in the years to come. Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM's potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud. Red Hat Summit: Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners: Modernized infrastructure meets enterprise-ready AI — Tuesday, May 20, 8-10 a.m. EDT (YouTube) Hybrid cloud evolves to deliver enterprise innovation — Wednesday, May 21, 8-9:30 a.m. EDT (YouTube) Supporting Quotes: Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat 'Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment.' Ramine Roane, corporate vice president, AI Product Management, AMD 'In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD Instinct™ GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators.' Jeremy Foster, senior vice president and general manager, Cisco 'AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next.' Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel 'Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel® Gaudi® accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications.' John Fanelli, vice president, Enterprise Software, NVIDIA 'High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design.' About Red Hat: Red Hat is the world's leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future. Forward-Looking Statements: Except for the historical information and discussions contained herein, statements contained in this press release may constitute forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are based on the company's current assumptions regarding future business and financial performance. These statements involve a number of risks, uncertainties and other factors that could cause actual results to differ materially. Any forward-looking statement in this press release speaks only as of the date on which it is made. Except as required by law, the company assumes no obligation to update or revise any forward-looking statements.

Web Release

25-05-2025

Business
Web Release

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server Red Hat, the world's leading provider of open source solutions, announced Red Hat AI Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat's integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat's vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently deploy and scale gen AI in production. Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It's the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease. Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments. LLM: Extending inference innovation Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more. LLM's broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI's future. Introducing Red Hat AI Inference Server Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerized offering or as part of both RHEL AI and Red Hat OpenShift AI. Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with: Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimizing compute consumption while preserving and potentially enhancing model accuracy. Optimized model repository , hosted in the Red Hat AI organization on Hugging Face , offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy. Red Hat's enterprise support and decades of expertise in bringing community projects to production environments. Third-party support for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat's third-party support policy . Red Hat's vision: Any model, any accelerator, any cloud. The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform – a standard for more seamless, high-performance AI innovation, both today and in the years to come. Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM's potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud. Red Hat Summit Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners: Supporting Quotes Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat 'Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment.' Ramine Roane, corporate vice president, AI Product Management, AMD 'In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD Instinct™ GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators.' Jeremy Foster, senior vice president and general manager, Cisco 'AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next.' Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel 'Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel® Gaudi® accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications.' John Fanelli, vice president, Enterprise Software, NVIDIA 'High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design.' Additional Resources Connect with Red Hat

Red Hat & Meta unite to drive open source AI for business

Techday NZ

21-05-2025

Business
Techday NZ

Red Hat & Meta unite to drive open source AI for business

Red Hat and Meta have announced a collaboration aimed at advancing open source generative artificial intelligence (AI) for enterprise use. The collaboration began with Red Hat enabling the Llama 4 model family from Meta on Red Hat AI and the vLLM inference server. This initial integration enables businesses to deploy generative AI applications and agents with a simplified process. Both companies plan to continue this effort by promoting the alignment of the Llama Stack and the vLLM community projects, with the goal of creating unified frameworks for open generative AI workloads. Red Hat and Meta indicated that they are championing open standards to ensure that generative AI applications operate efficiently across hybrid cloud environments, independent of specific hardware accelerators or computing environments. This direction is aimed at creating consistency and reducing costs in enterprise AI deployments. Mike Ferris, Senior Vice President and Chief Strategy Officer at Red Hat, stated: "Red Hat and Meta both recognize that AI's future success demands not only model advancements but also inference capabilities that let users maximize the breakthrough capabilities of next-generation models. Our joint commitment to Llama Stack and vLLM are intended to help realize a vision of faster, more consistent and more cost-effective gen AI applications running wherever needed across the hybrid cloud, regardless of accelerator or environment. This is the open future of AI, and one that Red Hat and Meta are ready to meet." According to Gartner, by 2026, over 80% of independent software vendors are expected to have embedded generative AI capabilities in their enterprise applications, compared to the less than 1% observed currently. Red Hat and Meta's collaboration addresses the need for open and interoperable foundations, particularly at the application programming interface (API) layer and within inference serving, which handles real-time operational AI workloads. Llama Stack, developed and released as open source by Meta, provides standardized building blocks and APIs for the full lifecycle of generative AI applications. Red Hat is actively contributing to the Llama Stack project, which the company expects will improve options for developers who are building agentic AI applications on Red Hat AI. Red Hat has committed to supporting a range of agentic frameworks, including Llama Stack, in order to offer customers flexibility in their tooling and development approaches. With these developments, Red Hat aims to create an environment that accelerates the development and deployment of next-generation AI solutions, which align with emerging technologies and methods in the sector. On the inference side, the vLLM project acts as an open source platform supporting efficient inference for large language models such as the Llama series. Red Hat has made leading contributions to vLLM, ensuring immediate support for Llama 4 models. Meta has pledged to increase its engagement with the vLLM community project, aiming to enhance its capabilities for cost-effective and scalable AI inference. The project is also part of the PyTorch ecosystem, which Meta and others support, contributing to an inclusive AI tools environment. Ash Jhaveri, Vice President of AI and Reality Labs Partnerships at Meta, said: "We are excited to partner with Red Hat as we work towards establishing Llama Stack as the industry standard for seamlessly building and deploying generative AI applications. This collaboration underscores our commitment to open innovation and the development of robust, scalable AI solutions that empower businesses to harness the full potential of AI technology. Together with Red Hat, we are paving the way for a future where Llama models and tools become the backbone of enterprise AI, driving efficiency and innovation across industries." The collaboration formalises the intent of both companies to bolster open source AI foundations, facilitate interoperability, and expand choice for enterprise customers in building and deploying generative AI solutions across various computing environments.

Red Hat & Google Cloud extend partnership for AI innovation

Techday NZ

21-05-2025

Business
Techday NZ

Red Hat & Google Cloud extend partnership for AI innovation

Red Hat and Google Cloud have agreed to extend their partnership to focus on advancing artificial intelligence (AI) for enterprises, specifically with new developments in open and agentic AI solutions. The collaboration will bring together Red Hat's open source technologies and Google Cloud's infrastructure, along with Google's Gemma family of open AI models. This initiative aims to offer cost-effective AI inference and greater hardware choices for businesses deploying generative AI at scale. Brian Stevens, Senior Vice President and Chief Technology Officer – AI, Red Hat said, "With this extended collaboration, Red Hat and Google Cloud are committed to driving groundbreaking AI innovations with our combined expertise and platforms. Bringing the power of vLLM and Red Hat open source technologies to Google Cloud and Google's Gemma equips developers with the resources they need to build more accurate, high-performing AI solutions, powered by optimized inference capabilities." The latest phase of the alliance will see the companies launch the llm-d open source project, with Google acting as a founding contributor. This project is intended to facilitate scalable and efficient AI inference across diverse computing environments. Red Hat is introducing the project as a response to enterprise challenges, such as the growing complexity of AI ecosystems and the need for distributed computing strategies. The companies have also announced that support for vLLM, an open source inference server used to speed up generative AI outputs, will be enabled on Google Cloud's Tensor Processing Units (TPUs) and GPU-based virtual machines. Google Cloud's TPUs, which are already a part of Google's own AI infrastructure, will now be accessible to developers using vLLM, allowing for improved performance and resource efficiency for fast and accurate inference. Red Hat will be among the earliest testers for Google's new open model Gemma 3, and it will provide 'Day 0' support for vLLM on Gemma 3 model distributions. This is part of Red Hat's broader efforts as a commercial contributor to the vLLM project, focusing on more cost-effective and responsive platforms for generative AI applications. The collaboration also includes the availability of Red Hat AI Inference Server on Google Cloud. This enterprise distribution of vLLM helps companies scale and optimise AI model inference within hybrid cloud environments. The integration with Google Cloud enables enterprises to deploy generative AI models that are ready for production and can deliver cost and responsiveness efficiencies at scale. Supporting community-driven AI development, Red Hat will join Google as a contributor to the Agent2Agent (A2A) protocol, an application-level protocol designed to enable communication between agents or end-users across different platforms and cloud environments. Through the A2A ecosystem, Red Hat aims to promote new ways to accelerate innovation and enhance the effectiveness of AI workflows through agentic AI. Mark Lohmeyer, Vice President and General Manager, AI and Computing Infrastructure, Google Cloud, commented, "The deepening of our collaboration with Red Hat is driven by our shared commitment to foster open innovation and bring the full potential of AI to our customers. As we enter a new age of AI inference, together we are paving the way for organisations to more effectively scale AI inference and enable agentic AI with the necessary cost-efficiency and high performance." The llm-d project builds upon the established vLLM community, aiming to create a foundation for generative AI inference that can adapt to the demands of large-scale enterprises while facilitating innovation and cost management. The intention is to enable AI workload scalability across different resource types and enhance workload efficiency. These initiatives highlight the companies' collective effort to offer business users production-ready, scalable, and efficient AI solutions powered by open source technologies and robust infrastructure options.

Red Hat leads launch of llm-d to scale generative AI in clouds

Techday NZ

21-05-2025

Business
Techday NZ

Red Hat leads launch of llm-d to scale generative AI in clouds

Red Hat has introduced llm-d, an open source project aimed at enabling large-scale distributed generative AI inference across hybrid cloud environments. The llm-d initiative is the result of collaboration between Red Hat and a group of founding contributors comprising CoreWeave, Google Cloud, IBM Research and NVIDIA, with additional support from AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, and academic partners from the University of California, Berkeley, and the University of Chicago. The new project utilises vLLM-based distributed inference, a native Kubernetes architecture, and AI-aware network routing to facilitate robust and scalable AI inference clouds that can meet demanding production service-level objectives. Red Hat asserts that this will support any AI model, on any hardware accelerator, in any cloud environment. Brian Stevens, Senior Vice President and AI CTO at Red Hat, stated, "The launch of the llm-d community, backed by a vanguard of AI leaders, marks a pivotal moment in addressing the need for scalable gen AI inference, a crucial obstacle that must be overcome to enable broader enterprise AI adoption. By tapping the innovation of vLLM and the proven capabilities of Kubernetes, llm-d paves the way for distributed, scalable and high-performing AI inference across the expanded hybrid cloud, supporting any model, any accelerator, on any cloud environment and helping realize a vision of limitless AI potential." Addressing the scaling needs of generative AI, Red Hat points to a Gartner forecast that suggests by 2028, more than 80% of data centre workload accelerators will be principally deployed for inference rather than model training. This projected shift highlights the necessity for efficient and scalable inference solutions as AI models become larger and more complex. The llm-d project's architecture is designed to overcome the practical limitations of centralised AI inference, such as prohibitive costs and latency. Its main features include vLLM for rapid model support, Prefill and Decode Disaggregation for distributing computational workloads, KV Cache Offloading based on LMCache to shift memory loads onto standard storage, and AI-Aware Network Routing for optimised request scheduling. Further, the project supports Google Cloud's Tensor Processing Units and NVIDIA's Inference Xfer Library for high-performance data transfer. The community formed around llm-d comprises both technology vendors and academic institutions. Each wants to address efficiency, cost, and performance at scale for AI-powered applications. Several of these partners provided statements regarding their involvement and the intended impact of the project. Ramine Roane, Corporate Vice President, AI Product Management at AMD, said, "AMD is proud to be a founding member of the llm-d community, contributing our expertise in high-performance GPUs to advance AI inference for evolving enterprise AI needs. As organisations navigate the increasing complexity of generative AI to achieve greater scale and efficiency, AMD looks forward to meeting this industry demand through the llm-d project." Shannon McFarland, Vice President, Cisco Open Source Program Office & Head of Cisco DevNet, remarked, "The llm-d project is an exciting step forward for practical generative AI. llm-d empowers developers to programmatically integrate and scale generative AI inference, unlocking new levels of innovation and efficiency in the modern AI landscape. Cisco is proud to be part of the llm-d community, where we're working together to explore real-world use cases that help organisations apply AI more effectively and efficiently." Chen Goldberg, Senior Vice President, Engineering, CoreWeave, commented, "CoreWeave is proud to be a founding contributor to the llm-d project and to deepen our long-standing commitment to open source AI. From our early partnership with EleutherAI to our ongoing work advancing inference at scale, we've consistently invested in making powerful AI infrastructure more accessible. We're excited to collaborate with an incredible group of partners and the broader developer community to build a flexible, high-performance inference engine that accelerates innovation and lays the groundwork for open, interoperable AI." Mark Lohmeyer, Vice President and General Manager, AI & Computing Infrastructure, Google Cloud, stated, "Efficient AI inference is paramount as organisations move to deploying AI at scale and deliver value for their users. As we enter this new age of inference, Google Cloud is proud to build upon our legacy of open source contributions as a founding contributor to the llm-d project. This new community will serve as a critical catalyst for distributed AI inference at scale, helping users realise enhanced workload efficiency with increased optionality for their infrastructure resources." Jeff Boudier, Head of Product, Hugging Face, said, "We believe every company should be able to build and run their own models. With vLLM leveraging the Hugging Face transformers library as the source of truth for model definitions; a wide diversity of models large and small is available to power text, audio, image and video AI applications. Eight million AI Builders use Hugging Face to collaborate on over two million AI models and datasets openly shared with the global community. We are excited to support the llm-d project to enable developers to take these applications to scale." Priya Nagpurkar, Vice President, Hybrid Cloud and AI Platform, IBM Research, commented, "At IBM, we believe the next phase of AI is about efficiency and scale. We're focused on unlocking value for enterprises through AI solutions they can deploy effectively. As a founding contributor to llm-d, IBM is proud to be a key part of building a differentiated hardware agnostic distributed AI inference platform. We're looking forward to continued contributions towards the growth and success of this community to transform the future of AI inference." Bill Pearson, Vice President, Data Center & AI Software Solutions and Ecosystem, Intel, said, "The launch of llm-d will serve as a key inflection point for the industry in driving AI transformation at scale, and Intel is excited to participate as a founding supporter. Intel's involvement with llm-d is the latest milestone in our decades-long collaboration with Red Hat to empower enterprises with open source solutions that they can deploy anywhere, on their platform of choice. We look forward to further extending and building AI innovation through the llm-d community." Eve Callicoat, Senior Staff Engineer, ML Platform, Lambda, commented, "Inference is where the real-world value of AI is delivered, and llm-d represents a major leap forward. Lambda is proud to support a project that makes state-of-the-art inference accessible, efficient, and open." Ujval Kapasi, Vice President, Engineering AI Frameworks, NVIDIA, stated, "The llm-d project is an important addition to the open source AI ecosystem and reflects NVIDIA's support for collaboration to drive innovation in generative AI. Scalable, highly performant inference is key to the next wave of generative and agentic AI. We're working with Red Hat and other supporting partners to foster llm-d community engagement and industry adoption, helping accelerate llm-d with innovations from NVIDIA Dynamo such as NIXL." Ion Stoica, Professor and Director of Sky Computing Lab, University of California, Berkeley, remarked, "We are pleased to see Red Hat build upon the established success of vLLM, which originated in our lab to help address the speed and memory challenges that come with running large AI models. Open source projects like vLLM, and now llm-d anchored in vLLM, are at the frontier of AI innovation tackling the most demanding AI inference requirements and moving the needle for the industry at large." Junchen Jiang, Professor at the LMCache Lab, University of Chicago, added, "Distributed KV cache optimisations, such as offloading, compression, and blending, have been a key focus of our lab, and we are excited to see llm-d leveraging LMCache as a core component to reduce time to first token as well as improve throughput, particularly in long-context inference."

Latest news with #vLLM

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat & Meta unite to drive open source AI for business

Red Hat & Google Cloud extend partnership for AI innovation

Red Hat leads launch of llm-d to scale generative AI in clouds

Get Started Now: Download the App