Latest news with #JoeFernandes


Techday NZ
21-05-2025
- Business
- Techday NZ
Red Hat launches enterprise AI inference server for hybrid cloud
Red Hat has introduced Red Hat AI Inference Server, an enterprise-grade offering aimed at enabling generative artificial intelligence (AI) inference across hybrid cloud environments. The Red Hat AI Inference Server emerges as an offering that leverages the vLLM community project, initially started by the University of California, Berkeley. Through Red Hat's integration of Neural Magic technologies, the solution aims to deliver higher speed, improved efficiency with a range of AI accelerators, and reduced operational costs. The platform is designed to allow organisations to run generative AI models on any AI accelerator within any cloud infrastructure. The solution can be deployed as a standalone containerised offering or as part of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI. Red Hat says this approach is intended to empower enterprises to deploy and scale generative AI in production with increased confidence. Joe Fernandes, Vice President and General Manager for Red Hat's AI Business Unit, commented on the launch: "Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment." The inference phase in AI refers to the process where pre-trained models are used to generate outputs, a stage which can be a significant inhibitor to performance and cost efficiency if not managed appropriately. The increasing complexity and scale of generative AI models have highlighted the need for robust inference solutions capable of handling production deployments across diverse infrastructures. The Red Hat AI Inference Server builds on the technology foundation established by the vLLM project. vLLM is known for high-throughput AI inference, ability to handle large input context, acceleration over multiple GPUs, and continuous batching to enhance deployment versatility. Additionally, vLLM extends support to a broad range of publicly available models, including DeepSeek, Google's Gemma, Llama, Llama Nemotron, Mistral, and Phi, among others. Its integration with leading models and enterprise-grade reasoning capabilities places it as a candidate for a standard in AI inference innovation. The packaged enterprise offering delivers a supported and hardened distribution of vLLM, with several additional tools. These include intelligent large language model (LLM) compression utilities to reduce AI model sizes while preserving or enhancing accuracy, and an optimised model repository hosted under Red Hat AI on Hugging Face. This repository enables instant access to validated and optimised AI models tailored for inference, designed to help improve efficiency by two to four times without the need to compromise on the accuracy of results. Red Hat also provides enterprise support, drawing upon expertise in bringing community-developed technologies into production. For expanded deployment options, the Red Hat AI Inference Server can be run on non-Red Hat Linux and Kubernetes platforms in line with the company's third-party support policy. The company's stated vision is to enable a universal inference platform that can accommodate any model, run on any accelerator, and be deployed in any cloud environment. Red Hat sees the success of generative AI relying on the adoption of such standardised inference solutions to ensure consistent user experiences without increasing costs. Ramine Roane, Corporate Vice President of AI Product Management at AMD, said: "In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD InstinctTM GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators." Jeremy Foster, Senior Vice President and General Manager at Cisco, commented on the joint opportunities provided by the offering: "AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next." Intel's Bill Pearson, Vice President of Data Center & AI Software Solutions and Ecosystem, said: "Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel Gaudi accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications." John Fanelli, Vice President of Enterprise Software at NVIDIA, added: "High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design." Red Hat has stated its intent to further build upon the vLLM community as well as drive development of distributed inference technologies such as llm-d, aiming to establish vLLM as an open standard for inference in hybrid cloud environments.


Techday NZ
21-05-2025
- Business
- Techday NZ
F5 & Red Hat expand AI partnership for secure cloud-native apps
F5 has expanded its collaboration with Red Hat to support enterprise adoption of secure and scalable artificial intelligence (AI) applications and has introduced F5 BIG-IP Next Cloud-Native Network Functions (CNF) 2.0 to assist organisations in handling the demands of high-bandwidth operations. The expanded partnership between F5 and Red Hat will facilitate integration of the F5 Application Delivery and Security Platform with Red Hat OpenShift AI. This approach aims to enable enterprises to more rapidly and securely implement AI, supporting use cases such as retrieval-augmented generation (RAG), secure model serving, and robust data ingestion mechanisms. "Enterprises are eager to harness the power of AI, but they face significant challenges in scaling and securing these applications," said Kunal Anand, Chief Innovation Officer at F5. "Our collaboration with Red Hat aims to simplify this journey by providing integrated solutions that address performance, security, and observability needs, enabling organisations to realise tangible AI outcomes." An F5 survey presented in its 2025 State of Application Strategy Report underlines the accelerating pace of AI adoption: 96 per cent of organisations are now deploying AI models, a marked rise from 25 per cent in 2023. The same report reveals that 72 per cent of respondents are targeting AI for optimising application performance, while 59 per cent are using it to improve cost efficiency and security. The collaboration between F5 and Red Hat will address various operational building blocks necessary for enterprise AI deployment, which include securing data pipelines and optimising inference performance. The focus is on ensuring that organisations have the confidence, speed, and control needed for AI adoption. Specific areas of this collaboration include RAG and model serving at scale, where F5 provides support for AI-powered applications on Red Hat OpenShift AI, facilitating secure data flows, optimal GPU utilisation, and quick response times. Joint efforts also address the acceleration of big data ingestion, through the combined capabilities of MinIO and F5 on Red Hat OpenShift AI, supporting the training and inference process for large datasets. API-first AI security is highlighted, as F5 offers protection from threats such as prompt injection, model theft, and data leakage through its distributed cloud and BIG-IP solutions. The alliance positions F5's API gateway and AI security features to integrate seamlessly with Red Hat's open platform, offering customers an open and flexible approach to AI infrastructure. Red Hat OpenShift AI allows for modular development and deployment of AI applications across hybrid environments. "As AI becomes core to how businesses operate and compete, organisations need platforms that offer flexibility without compromising security," said Joe Fernandes, Vice President and General Manager, AI Business Unit, Red Hat. "We believe the future of AI is open source, and Red Hat OpenShift AI, when used in combination with F5's robust security and observability, gives organisations the necessary tools to build and scale AI applications with greater confidence, anywhere they choose to run them." Alongside the Red Hat partnership, F5 introduced F5 BIG-IP Next CNF 2.0, a solution designed to meet the growing requirements of large-scale cloud-native applications in industries where AI and other high-bandwidth applications are prevalent. The solution enhances the F5 Application Delivery and Security Platform with new Kubernetes-native features. F5 BIG-IP Next CNF 2.0 is tailored for telecommunications providers, ISPs, cloud service providers, and large enterprises, offering a consolidated method to handle security, resource allocation, and network operation scaling. Integrated services include DDoS protection, firewall, intrusion prevention, and carrier-grade network address translation, allowing organisations to manage infrastructure and traffic with increased efficiency. Kunal Anand said, "Service providers and large enterprises are under pressure to scale faster, operate leaner, and stay secure—all in increasingly complex environments. With BIG-IP Next CNF 2.0, we're extending the F5 ADSP with a truly cloud-native solution built for modern, decentralised infrastructure. Unlike legacy virtualised approaches that burn resources, our Kubernetes-native architecture unlocks smarter scaling, stronger security, and more efficient delivery of high-bandwidth services—giving customers the flexibility to move faster without compromise." The product introduces features such as horizontal scalability through disaggregation, accelerated DNS services for improved latency, advanced policy enforcement, and unified security management. F5 reports that the solution can lower CPU utilisation by 33 per cent and reduce infrastructure costs by over 60 per cent, while offering separate scaling for control and data planes for increased deployment flexibility. F5 BIG-IP Next CNF 2.0 has been designed for effective integration with Red Hat OpenShift, leveraging Kubernetes to provide service providers with greater scalability, simplified usability, and enhanced security for modern cloud-native applications. These capabilities are intended to help organisations streamline management, ensure low-latency performance, and adapt to growing network demands associated with high-bandwidth services such as AI.