logo
#

Latest news with #BrianStevens

Red Hat & Google Cloud extend partnership for AI innovation
Red Hat & Google Cloud extend partnership for AI innovation

Techday NZ

time21-05-2025

  • Business
  • Techday NZ

Red Hat & Google Cloud extend partnership for AI innovation

Red Hat and Google Cloud have agreed to extend their partnership to focus on advancing artificial intelligence (AI) for enterprises, specifically with new developments in open and agentic AI solutions. The collaboration will bring together Red Hat's open source technologies and Google Cloud's infrastructure, along with Google's Gemma family of open AI models. This initiative aims to offer cost-effective AI inference and greater hardware choices for businesses deploying generative AI at scale. Brian Stevens, Senior Vice President and Chief Technology Officer – AI, Red Hat said, "With this extended collaboration, Red Hat and Google Cloud are committed to driving groundbreaking AI innovations with our combined expertise and platforms. Bringing the power of vLLM and Red Hat open source technologies to Google Cloud and Google's Gemma equips developers with the resources they need to build more accurate, high-performing AI solutions, powered by optimized inference capabilities." The latest phase of the alliance will see the companies launch the llm-d open source project, with Google acting as a founding contributor. This project is intended to facilitate scalable and efficient AI inference across diverse computing environments. Red Hat is introducing the project as a response to enterprise challenges, such as the growing complexity of AI ecosystems and the need for distributed computing strategies. The companies have also announced that support for vLLM, an open source inference server used to speed up generative AI outputs, will be enabled on Google Cloud's Tensor Processing Units (TPUs) and GPU-based virtual machines. Google Cloud's TPUs, which are already a part of Google's own AI infrastructure, will now be accessible to developers using vLLM, allowing for improved performance and resource efficiency for fast and accurate inference. Red Hat will be among the earliest testers for Google's new open model Gemma 3, and it will provide 'Day 0' support for vLLM on Gemma 3 model distributions. This is part of Red Hat's broader efforts as a commercial contributor to the vLLM project, focusing on more cost-effective and responsive platforms for generative AI applications. The collaboration also includes the availability of Red Hat AI Inference Server on Google Cloud. This enterprise distribution of vLLM helps companies scale and optimise AI model inference within hybrid cloud environments. The integration with Google Cloud enables enterprises to deploy generative AI models that are ready for production and can deliver cost and responsiveness efficiencies at scale. Supporting community-driven AI development, Red Hat will join Google as a contributor to the Agent2Agent (A2A) protocol, an application-level protocol designed to enable communication between agents or end-users across different platforms and cloud environments. Through the A2A ecosystem, Red Hat aims to promote new ways to accelerate innovation and enhance the effectiveness of AI workflows through agentic AI. Mark Lohmeyer, Vice President and General Manager, AI and Computing Infrastructure, Google Cloud, commented, "The deepening of our collaboration with Red Hat is driven by our shared commitment to foster open innovation and bring the full potential of AI to our customers. As we enter a new age of AI inference, together we are paving the way for organisations to more effectively scale AI inference and enable agentic AI with the necessary cost-efficiency and high performance." The llm-d project builds upon the established vLLM community, aiming to create a foundation for generative AI inference that can adapt to the demands of large-scale enterprises while facilitating innovation and cost management. The intention is to enable AI workload scalability across different resource types and enhance workload efficiency. These initiatives highlight the companies' collective effort to offer business users production-ready, scalable, and efficient AI solutions powered by open source technologies and robust infrastructure options.

Red Hat leads launch of llm-d to scale generative AI in clouds
Red Hat leads launch of llm-d to scale generative AI in clouds

Techday NZ

time21-05-2025

  • Business
  • Techday NZ

Red Hat leads launch of llm-d to scale generative AI in clouds

Red Hat has introduced llm-d, an open source project aimed at enabling large-scale distributed generative AI inference across hybrid cloud environments. The llm-d initiative is the result of collaboration between Red Hat and a group of founding contributors comprising CoreWeave, Google Cloud, IBM Research and NVIDIA, with additional support from AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, and academic partners from the University of California, Berkeley, and the University of Chicago. The new project utilises vLLM-based distributed inference, a native Kubernetes architecture, and AI-aware network routing to facilitate robust and scalable AI inference clouds that can meet demanding production service-level objectives. Red Hat asserts that this will support any AI model, on any hardware accelerator, in any cloud environment. Brian Stevens, Senior Vice President and AI CTO at Red Hat, stated, "The launch of the llm-d community, backed by a vanguard of AI leaders, marks a pivotal moment in addressing the need for scalable gen AI inference, a crucial obstacle that must be overcome to enable broader enterprise AI adoption. By tapping the innovation of vLLM and the proven capabilities of Kubernetes, llm-d paves the way for distributed, scalable and high-performing AI inference across the expanded hybrid cloud, supporting any model, any accelerator, on any cloud environment and helping realize a vision of limitless AI potential." Addressing the scaling needs of generative AI, Red Hat points to a Gartner forecast that suggests by 2028, more than 80% of data centre workload accelerators will be principally deployed for inference rather than model training. This projected shift highlights the necessity for efficient and scalable inference solutions as AI models become larger and more complex. The llm-d project's architecture is designed to overcome the practical limitations of centralised AI inference, such as prohibitive costs and latency. Its main features include vLLM for rapid model support, Prefill and Decode Disaggregation for distributing computational workloads, KV Cache Offloading based on LMCache to shift memory loads onto standard storage, and AI-Aware Network Routing for optimised request scheduling. Further, the project supports Google Cloud's Tensor Processing Units and NVIDIA's Inference Xfer Library for high-performance data transfer. The community formed around llm-d comprises both technology vendors and academic institutions. Each wants to address efficiency, cost, and performance at scale for AI-powered applications. Several of these partners provided statements regarding their involvement and the intended impact of the project. Ramine Roane, Corporate Vice President, AI Product Management at AMD, said, "AMD is proud to be a founding member of the llm-d community, contributing our expertise in high-performance GPUs to advance AI inference for evolving enterprise AI needs. As organisations navigate the increasing complexity of generative AI to achieve greater scale and efficiency, AMD looks forward to meeting this industry demand through the llm-d project." Shannon McFarland, Vice President, Cisco Open Source Program Office & Head of Cisco DevNet, remarked, "The llm-d project is an exciting step forward for practical generative AI. llm-d empowers developers to programmatically integrate and scale generative AI inference, unlocking new levels of innovation and efficiency in the modern AI landscape. Cisco is proud to be part of the llm-d community, where we're working together to explore real-world use cases that help organisations apply AI more effectively and efficiently." Chen Goldberg, Senior Vice President, Engineering, CoreWeave, commented, "CoreWeave is proud to be a founding contributor to the llm-d project and to deepen our long-standing commitment to open source AI. From our early partnership with EleutherAI to our ongoing work advancing inference at scale, we've consistently invested in making powerful AI infrastructure more accessible. We're excited to collaborate with an incredible group of partners and the broader developer community to build a flexible, high-performance inference engine that accelerates innovation and lays the groundwork for open, interoperable AI." Mark Lohmeyer, Vice President and General Manager, AI & Computing Infrastructure, Google Cloud, stated, "Efficient AI inference is paramount as organisations move to deploying AI at scale and deliver value for their users. As we enter this new age of inference, Google Cloud is proud to build upon our legacy of open source contributions as a founding contributor to the llm-d project. This new community will serve as a critical catalyst for distributed AI inference at scale, helping users realise enhanced workload efficiency with increased optionality for their infrastructure resources." Jeff Boudier, Head of Product, Hugging Face, said, "We believe every company should be able to build and run their own models. With vLLM leveraging the Hugging Face transformers library as the source of truth for model definitions; a wide diversity of models large and small is available to power text, audio, image and video AI applications. Eight million AI Builders use Hugging Face to collaborate on over two million AI models and datasets openly shared with the global community. We are excited to support the llm-d project to enable developers to take these applications to scale." Priya Nagpurkar, Vice President, Hybrid Cloud and AI Platform, IBM Research, commented, "At IBM, we believe the next phase of AI is about efficiency and scale. We're focused on unlocking value for enterprises through AI solutions they can deploy effectively. As a founding contributor to llm-d, IBM is proud to be a key part of building a differentiated hardware agnostic distributed AI inference platform. We're looking forward to continued contributions towards the growth and success of this community to transform the future of AI inference." Bill Pearson, Vice President, Data Center & AI Software Solutions and Ecosystem, Intel, said, "The launch of llm-d will serve as a key inflection point for the industry in driving AI transformation at scale, and Intel is excited to participate as a founding supporter. Intel's involvement with llm-d is the latest milestone in our decades-long collaboration with Red Hat to empower enterprises with open source solutions that they can deploy anywhere, on their platform of choice. We look forward to further extending and building AI innovation through the llm-d community." Eve Callicoat, Senior Staff Engineer, ML Platform, Lambda, commented, "Inference is where the real-world value of AI is delivered, and llm-d represents a major leap forward. Lambda is proud to support a project that makes state-of-the-art inference accessible, efficient, and open." Ujval Kapasi, Vice President, Engineering AI Frameworks, NVIDIA, stated, "The llm-d project is an important addition to the open source AI ecosystem and reflects NVIDIA's support for collaboration to drive innovation in generative AI. Scalable, highly performant inference is key to the next wave of generative and agentic AI. We're working with Red Hat and other supporting partners to foster llm-d community engagement and industry adoption, helping accelerate llm-d with innovations from NVIDIA Dynamo such as NIXL." Ion Stoica, Professor and Director of Sky Computing Lab, University of California, Berkeley, remarked, "We are pleased to see Red Hat build upon the established success of vLLM, which originated in our lab to help address the speed and memory challenges that come with running large AI models. Open source projects like vLLM, and now llm-d anchored in vLLM, are at the frontier of AI innovation tackling the most demanding AI inference requirements and moving the needle for the industry at large." Junchen Jiang, Professor at the LMCache Lab, University of Chicago, added, "Distributed KV cache optimisations, such as offloading, compression, and blending, have been a key focus of our lab, and we are excited to see llm-d leveraging LMCache as a core component to reduce time to first token as well as improve throughput, particularly in long-context inference."

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store