Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Web Release25-05-2025

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server
Red Hat, the world's leading provider of open source solutions, announced Red Hat AI Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat's integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat's vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently deploy and scale gen AI in production.
Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It's the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease.
Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments.
LLM: Extending inference innovation
Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more.
LLM's broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI's future.
Introducing Red Hat AI Inference Server
Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerized offering or as part of both RHEL AI and Red Hat OpenShift AI.
Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with:
Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimizing compute consumption while preserving and potentially enhancing model accuracy.
Optimized model repository , hosted in the Red Hat AI organization on Hugging Face , offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy.
Red Hat's enterprise support and decades of expertise in bringing community projects to production environments.
Third-party support
for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat's third-party support policy
.
Red Hat's vision: Any model, any accelerator, any cloud.
The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform – a standard for more seamless, high-performance AI innovation, both today and in the years to come.
Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM's potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud.
Red Hat Summit
Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners:
Supporting Quotes
Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat
'Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment.'
Ramine Roane, corporate vice president, AI Product Management, AMD
'In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD Instinct™ GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators.'
Jeremy Foster, senior vice president and general manager, Cisco
'AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next.'
Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel
'Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel® Gaudi® accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications.'
John Fanelli, vice president, Enterprise Software, NVIDIA
'High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design.'
Additional Resources
Connect with Red Hat

Hashtags

Business

Finance

#RedHatAIInferenceServer

#UniversityofCalifornia

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Red Hat Powers Modern Virtualization on Microsoft Azure

Web Release

14 hours ago

Web Release

Red Hat Powers Modern Virtualization on Microsoft Azure

Red Hat, the world's leading provider of open source solutions, announced the public preview of Red Hat OpenShift Virtualization on Microsoft Azure Red Hat OpenShift. Available as a self-managed operator included in Azure Red Hat OpenShift, Red Hat OpenShift Virtualization offers organizations an accelerated path to modernization by streamlining the migration of virtual machines (VMs) from existing virtualization platforms to a scalable, cloud-native platform. Azure Red Hat OpenShift is a turnkey application platform that is jointly managed and supported by Red Hat and Microsoft, designed to help to reduce the complexities associated with managing the underlying infrastructure and empower IT teams to focus time and resources on innovation and modernization rather than routine maintenance. With Red Hat OpenShift Virtualization on Azure Red Hat OpenShift, organizations benefit from a more consistent hybrid cloud stack that can support VMs and containers alike to help significantly streamline application modernization and accelerate cloud-native strategies. This enables organizations to more effectively modernize existing critical VM infrastructure while continuing to evolve with new innovations that meet future business needs. Red Hat OpenShift Virtualization on Azure Red Hat OpenShift empowers organizations to: Accelerate VM migration: Red Hat OpenShift Virtualization on Azure Red Hat OpenShift helps organizations quickly migrate and scale existing VM workloads with built-in migration tooling and automation capabilities, such as those through Red Hat Ansible Automation Platform and Red Hat Advanced Cluster Management , to simplify the migration process, minimize disruption and enable teams to quickly shift to modern infrastructure. Red Hat OpenShift Virtualization on Azure Red Hat OpenShift helps organizations quickly migrate and scale existing VM workloads with built-in and automation capabilities, such as those through and , to simplify the migration process, minimize disruption and enable teams to quickly shift to modern infrastructure. Simplify operations: Red Hat OpenShift Virtualization on Azure Red Hat OpenShift provides a unified view of operations to more seamlessly manage both VMs and containers on the same platform across the hybrid cloud. Automated deployment and management of Red Hat OpenShift clusters further reduces complexity and risk. Red Hat OpenShift Virtualization on Azure Red Hat OpenShift provides a unified view of operations to more seamlessly manage both VMs and containers on the same platform across the hybrid cloud. Automated deployment and management of Red Hat OpenShift clusters further reduces complexity and risk. Modernize infrastructure: Red Hat OpenShift Virtualization on Azure Red Hat OpenShift allows customers to build, modernize and deploy applications at scale. By adopting this Kubernetes-based platform, organizations instantly land on modern infrastructure that brings them two steps closer to their cloud-native application modernization goals. Azure Red Hat OpenShift brings modern application development processes and tools to VMs that help expedite the modernization of VM-based applications. Red Hat OpenShift Virtualization on Azure Red Hat OpenShift allows customers to build, modernize and deploy applications at scale. By adopting this Kubernetes-based platform, organizations instantly land on modern infrastructure that brings them two steps closer to their cloud-native application modernization goals. Azure Red Hat OpenShift brings modern application development processes and tools to VMs that help expedite the modernization of VM-based applications. Optimize resources: In addition to increasing DevOps productivity and decreasing the time to deploy applications with Azure Red Hat Openshift, further optimization can be realized with Red Hat OpenShift Virtualization by right-sizing VMs to better match workload needs. Built on the industry's leading hybrid cloud application platform powered by Kubernetes and Microsoft Azure's trusted cloud infrastructure, Azure Red Hat OpenShift delivers a future-ready platform with integrated security tooling, automation and management capabilities to extend innovation across the hybrid cloud. Red Hat OpenShift Virtualization is now available in public preview as a self-managed operator on Azure Red Hat OpenShift. Organizations can also apply their Microsoft Azure Consumption Commitment (MACC) and utilize the Azure Migration and Modernization Program (AMMP) for Azure Red Hat OpenShift. Additionally, customers can use the Azure Hybrid Benefit to reuse existing on-premise licenses for both Red Hat Enterprise Linux and Windows Licenses. Red Hat Summit To listen to Red Hat Summit keynotes from Red Hat executives, customers and partners: Supporting Quotes Chris Wright, senior vice president and chief technology officer, Red Hat 'As organizations continue to modernize and move away from legacy virtualization solutions, it is critical to choose a secure computing foundation for the future that can adapt to their current and evolving multi-infrastructure environments. Building upon our extensive history of collaboration and joint engineering efforts with Microsoft Azure, Red Hat OpenShift Virtualization running on Azure Red Hat OpenShift delivers more consistent orchestration for VMs and containers alike, setting organizations on a clear path to modern application development and deployment.' Brendan Burns, corporate vice president, Azure Compute, Microsoft 'As customers modernize and move their apps from traditional virtual-machine-based fabrics that are on-premises to modern Kubernetes platforms, some components still need to run on traditional virtual machines for a while. To address this, Microsoft and Red Hat are collaborating to bring open-source innovation from the KubeVirt project into Azure Red Hat OpenShift. What I'm most excited about is how this enables customers to add virtualization capabilities to Azure Red Hat OpenShift, which allows them to modernize at their own pace, and get the best return on investment as they transition to the cloud.' Additional Resources Connect with Red Hat

Web Release

25-05-2025

Web Release

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server Red Hat, the world's leading provider of open source solutions, announced Red Hat AI Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat's integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat's vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently deploy and scale gen AI in production. Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It's the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease. Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments. LLM: Extending inference innovation Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more. LLM's broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI's future. Introducing Red Hat AI Inference Server Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerized offering or as part of both RHEL AI and Red Hat OpenShift AI. Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with: Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimizing compute consumption while preserving and potentially enhancing model accuracy. Optimized model repository , hosted in the Red Hat AI organization on Hugging Face , offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy. Red Hat's enterprise support and decades of expertise in bringing community projects to production environments. Third-party support for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat's third-party support policy . Red Hat's vision: Any model, any accelerator, any cloud. The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform – a standard for more seamless, high-performance AI innovation, both today and in the years to come. Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM's potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud. Red Hat Summit Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners: Supporting Quotes Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat 'Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment.' Ramine Roane, corporate vice president, AI Product Management, AMD 'In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD Instinct™ GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators.' Jeremy Foster, senior vice president and general manager, Cisco 'AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next.' Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel 'Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel® Gaudi® accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications.' John Fanelli, vice president, Enterprise Software, NVIDIA 'High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design.' Additional Resources Connect with Red Hat

Red Hat and Rocky Linux Advance RISC-V Integration

Arabian Post

25-05-2025

Arabian Post

Red Hat and Rocky Linux Advance RISC-V Integration

Red Hat Enterprise Linux and Rocky Linux are accelerating efforts to support the RISC-V architecture, signalling a significant shift in the enterprise Linux landscape. This development reflects the growing momentum behind RISC-V, an open-source instruction set architecture that is reshaping hardware innovation amid challenges in traditional chip supply chains and rising demand for customisable, energy-efficient computing. Red Hat, a subsidiary of IBM and a dominant force in the enterprise Linux market, has publicly confirmed its roadmap for incorporating RISC-V support into future releases of Red Hat Enterprise Linux . Similarly, the community-driven Rocky Linux project, a popular downstream fork of RHEL designed to provide enterprise-grade stability, has also announced ongoing work to integrate RISC-V compatibility. These parallel initiatives demonstrate broad industry recognition of RISC-V's potential to disrupt conventional processor markets dominated by x86 and ARM architectures. RISC-V's open standard removes licensing costs and restrictions typically associated with proprietary CPU designs, encouraging a diverse ecosystem of chip manufacturers, academic researchers, and software developers. This flexibility allows companies to tailor processor designs for specific workloads, which is particularly appealing for embedded systems, Internet of Things devices, and edge computing applications. Both Red Hat and Rocky Linux see this as a critical advantage for future-proofing their operating systems and meeting evolving customer needs. ADVERTISEMENT The move to support RISC-V involves extensive technical adaptation, as much of the existing software stack has been optimised for x86-64 and ARM64 platforms. Red Hat's engineering teams have been collaborating with hardware vendors and the wider open-source community to ensure that core components of RHEL, including the kernel, system libraries, and security modules, perform efficiently on RISC-V hardware. The company's investment in this area underscores its commitment to maintaining leadership across diverse infrastructure environments, including cloud, on-premises, and hybrid deployments. Rocky Linux, founded in the wake of CentOS's shift away from its traditional model, has quickly gained traction among enterprises seeking stable, free alternatives to RHEL. Its embrace of RISC-V support aligns with its mission to offer a robust platform compatible with the latest computing technologies. Developers contributing to Rocky Linux have been porting essential packages and testing workloads on early RISC-V development boards, working to iron out architecture-specific bugs and ensure seamless user experience. Industry analysts observe that this dual commitment to RISC-V by both a commercial giant and a grassroots community project indicates growing confidence in the architecture's viability for production environments. Although RISC-V chips currently lag behind in raw performance compared to established x86 and ARM processors, ongoing improvements in silicon design and fabrication suggest this gap will narrow. Support from major Linux distributions is pivotal to accelerating software ecosystem maturity, which has been a critical barrier to wider RISC-V adoption. Hardware manufacturers such as SiFive, Microchip, and Alibaba's semiconductor division are developing increasingly powerful RISC-V processors targeting servers and high-performance computing. Red Hat's engagement with these vendors is expected to facilitate optimisation efforts and certification programmes, ensuring that RHEL runs reliably on certified RISC-V platforms. This cooperation is vital as enterprises require not only compatibility but also assurances around security, stability, and long-term support when deploying new architectures. The initiative also ties into broader trends within the open-source community and the tech industry. Governments and large organisations are seeking to reduce dependency on single-source suppliers, especially in light of geopolitical tensions and supply chain vulnerabilities exposed in recent years. Open hardware initiatives like RISC-V are viewed as strategic assets that can drive innovation while enhancing security and sovereignty over critical technology infrastructure. ADVERTISEMENT Despite enthusiasm, challenges remain. Software ecosystem maturity is uneven; many applications and development tools require adaptation to fully leverage RISC-V capabilities. Furthermore, mainstream cloud providers have yet to offer widespread RISC-V hosting options, limiting deployment scenarios primarily to on-premises and experimental settings for now. However, companies such as Amazon Web Services have begun exploratory work with RISC-V instances, signalling potential expansion in the near term. Red Hat and Rocky Linux's pursuit of RISC-V support follows a pattern of early adoption seen in other Linux distributions such as Fedora, where RISC-V packages have been under development for some time. Their progress benefits from this groundwork, yet enterprise-grade readiness demands rigorous validation and comprehensive documentation to meet customer expectations. The growing adoption of RISC-V by major Linux providers may also stimulate innovation in adjacent fields. For example, edge computing deployments prioritising low power consumption and customised instruction sets stand to gain from enhanced RISC-V Linux support. Academic and research institutions involved in processor architecture development also benefit from improved access to stable operating systems, enabling faster prototyping and experimentation.

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Hashtags

Try Our AI Features

Comments

Related Articles

Red Hat Powers Modern Virtualization on Microsoft Azure

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat and Rocky Linux Advance RISC-V Integration

Get Started Now: Download the App