logo
#

Latest news with #LlamaNemotron

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server
Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Mid East Info

time25-05-2025

  • Business
  • Mid East Info

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat AI Inference Server, powered by vLLM and enhanced with Neural Magic technologies, delivers faster, higher-performing and more cost-efficient AI inference across the hybrid cloud BOSTON – RED HAT SUMMIT – MAY, 2025 — Red Hat, the world's leading provider of open source solutions, announced Red Hat AI Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat's integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat's vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently deploy and scale gen AI in production. Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It's the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease. Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments. vLLM: Extending inference innovation: Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more. vLLM's broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI's future. Introducing Red Hat AI Inference Server: Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerized offering or as part of both RHEL AI and Red Hat OpenShift AI. Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with: Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimizing compute consumption while preserving and potentially enhancing model accuracy. Optimized model repository, hosted in the Red Hat AI organization on Hugging Face, offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy. Red Hat's enterprise support and decades of expertise in bringing community projects to production environments. Third-party support for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat's third-party support policy. Red Hat's vision: Any model, any accelerator, any cloud. The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform – a standard for more seamless, high-performance AI innovation, both today and in the years to come. Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM's potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud. Red Hat Summit: Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners: Modernized infrastructure meets enterprise-ready AI — Tuesday, May 20, 8-10 a.m. EDT (YouTube) Hybrid cloud evolves to deliver enterprise innovation — Wednesday, May 21, 8-9:30 a.m. EDT (YouTube) Supporting Quotes: Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat 'Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment.' Ramine Roane, corporate vice president, AI Product Management, AMD 'In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD Instinct™ GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators.' Jeremy Foster, senior vice president and general manager, Cisco 'AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next.' Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel 'Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel® Gaudi® accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications.' John Fanelli, vice president, Enterprise Software, NVIDIA 'High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design.' About Red Hat: Red Hat is the world's leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future. Forward-Looking Statements: Except for the historical information and discussions contained herein, statements contained in this press release may constitute forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are based on the company's current assumptions regarding future business and financial performance. These statements involve a number of risks, uncertainties and other factors that could cause actual results to differ materially. Any forward-looking statement in this press release speaks only as of the date on which it is made. Except as required by law, the company assumes no obligation to update or revise any forward-looking statements.

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server
Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Web Release

time25-05-2025

  • Business
  • Web Release

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server

Red Hat Unlocks Generative AI for Any Model and Any Accelerator Across the Hybrid Cloud with Red Hat AI Inference Server Red Hat, the world's leading provider of open source solutions, announced Red Hat AI Inference Server, a significant step towards democratizing generative AI (gen AI) across the hybrid cloud. A new offering within Red Hat AI, the enterprise-grade inference server is born from the powerful vLLM community project and enhanced by Red Hat's integration of Neural Magic technologies, offering greater speed, accelerator-efficiency and cost-effectiveness to help deliver Red Hat's vision of running any gen AI model on any AI accelerator in any cloud environment. Whether deployed standalone or as an integrated component of Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, this breakthrough platform empowers organizations to more confidently deploy and scale gen AI in production. Inference is the critical execution engine of AI, where pre-trained models translate data into real-world impact. It's the pivotal point of user interaction, demanding swift and accurate responses. As gen AI models explode in complexity and production deployments scale, inference can become a significant bottleneck, devouring hardware resources and threatening to cripple responsiveness and inflate operational costs. Robust inference servers are no longer a luxury, but a necessity for unlocking the true potential of AI at scale, navigating underlying complexities with greater ease. Red Hat directly addresses these challenges with Red Hat AI Inference Server — an open inference solution engineered for high performance and equipped with leading model compression and optimization tools. This innovation empowers organizations to fully tap into the transformative power of gen AI by delivering dramatically more responsive user experiences and unparalleled freedom in their choice of AI accelerators, models and IT environments. LLM: Extending inference innovation Red Hat AI Inference Server builds on the industry-leading vLLM project, which was started by University of California, Berkeley in mid-2023. The community project delivers high-throughput gen AI inference, support for large input context, multi-GPU model acceleration, support for continuous batching and more. LLM's broad support for publicly available models – coupled with its day zero integration of leading frontier models including DeepSeek, Gemma, Llama, Llama Nemotron, Mistral, Phi and others, as well as open, enterprise-grade reasoning models like Llama Nemotron – positions it as a de facto standard for future AI inference innovation. Leading frontier model providers are increasingly embracing vLLM, solidifying its critical role in shaping gen AI's future. Introducing Red Hat AI Inference Server Red Hat AI Inference Server packages the leading innovation of vLLM and forges it into the enterprise-grade capabilities of Red Hat AI Inference Server. Red Hat AI Inference Server is available as a standalone containerized offering or as part of both RHEL AI and Red Hat OpenShift AI. Across any deployment environment, Red Hat AI Inference Server provides users with a hardened, supported distribution of vLLM, along with: Intelligent LLM compression tools for dramatically reducing the size of both foundational and fine-tuned AI models, minimizing compute consumption while preserving and potentially enhancing model accuracy. Optimized model repository , hosted in the Red Hat AI organization on Hugging Face , offers instant access to a validated and optimized collection of leading AI models ready for inference deployment, helping to accelerate efficiency by 2-4x without compromising model accuracy. Red Hat's enterprise support and decades of expertise in bringing community projects to production environments. Third-party support for even greater deployment flexibility, enabling Red Hat AI Inference Server to be deployed on non-Red Hat Linux and Kubernetes platforms pursuant to Red Hat's third-party support policy . Red Hat's vision: Any model, any accelerator, any cloud. The future of AI must be defined by limitless opportunity, not constrained by infrastructure silos. Red Hat sees a horizon where organizations can deploy any model, on any accelerator, across any cloud, delivering an exceptional, more consistent user experience without exorbitant costs. To unlock the true potential of gen AI investments, enterprises require a universal inference platform – a standard for more seamless, high-performance AI innovation, both today and in the years to come. Just as Red Hat pioneered the open enterprise by transforming Linux into the bedrock of modern IT, the company is now poised to architect the future of AI inference. vLLM's potential is that of a linchpin for standardized gen AI inference, and Red Hat is committed to building a thriving ecosystem around not just the vLLM community but also llm-d for distributed inference at scale. The vision is clear: regardless of the AI model, the underlying accelerator or the deployment environment, Red Hat intends to make vLLM the definitive open standard for inference across the new hybrid cloud. Red Hat Summit Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners: Supporting Quotes Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat 'Inference is where the real promise of gen AI is delivered, where user interactions are met with fast, accurate responses delivered by a given model, but it must be delivered in an effective and cost-efficient way. Red Hat AI Inference Server is intended to meet the demand for high-performing, responsive inference at scale while keeping resource demands low, providing a common inference layer that supports any model, running on any accelerator in any environment.' Ramine Roane, corporate vice president, AI Product Management, AMD 'In collaboration with Red Hat, AMD delivers out-of-the-box solutions to drive efficient generative AI in the enterprise. Red Hat AI Inference Server enabled on AMD Instinct™ GPUs equips organizations with enterprise-grade, community-driven AI inference capabilities backed by fully validated hardware accelerators.' Jeremy Foster, senior vice president and general manager, Cisco 'AI workloads need speed, consistency, and flexibility, which is exactly what the Red Hat AI Inference Server is designed to deliver. This innovation offers Cisco and Red Hat opportunities to continue to collaborate on new ways to make AI deployments more accessible, efficient and scalable—helping organizations prepare for what's next.' Bill Pearson, vice president, Data Center & AI Software Solutions and Ecosystem, Intel 'Intel is excited to collaborate with Red Hat to enable Red Hat AI Inference Server on Intel® Gaudi® accelerators. This integration will provide our customers with an optimized solution to streamline and scale AI inference, delivering advanced performance and efficiency for a wide range of enterprise AI applications.' John Fanelli, vice president, Enterprise Software, NVIDIA 'High-performance inference enables models and AI agents not just to answer, but to reason and adapt in real time. With open, full-stack NVIDIA accelerated computing and Red Hat AI Inference Server, developers can run efficient reasoning at scale across hybrid clouds, and deploy with confidence using Red Hat Inference Server with the new NVIDIA Enterprise AI validated design.' Additional Resources Connect with Red Hat

Nvidia vice president says GPUs are the 'currency' of AI researchers
Nvidia vice president says GPUs are the 'currency' of AI researchers

Yahoo

time10-05-2025

  • Business
  • Yahoo

Nvidia vice president says GPUs are the 'currency' of AI researchers

Nvidia's Llama Nemotron models were developed quickly, said Jonathan Cohen, VP of Applied Research. The speed was thanks to researchers across the company being willing to "give up their compute." "These days, the currency in any AI researcher is how many GPUs they get access to," he said in an interview. In the world of AI research, the speed of development is limited in large part by available computing resources, according to Jonathan Cohen, Vice President of Applied Research at Nvidia. "These days, the currency in any AI researcher is how many GPUs they get access to, and that's no less true at Nvidia than at any other company," Cohen said in an interview on Nvidia Developer. Cohen led the team responsible for developing Nvidia's Llama Nemotron family of models. Released in March of this year, they represent the company's entry into the world of "reasoning" AI systems. The speed at which the models came together was remarkable, Cohen said, taking "no more than one to two months." He partially credits the efficiency of their development to other workers being willing to sacrifice their processing power. "So, there were a lot of researchers who very selflessly agreed to give up their compute so that we could get these Llama Nemotron models trained as quickly as we did," he said. Cohen also attributed the speed of development to Nvidia's company-wide culture of prioritizing major projects, regardless of current team goals. "How do you have a team to do a thing you've never done before? Part of the corporate culture is — we call them a 'swarm' — where you identify, 'This is something that's important,'" he said. "And everyone, every manager who has people who might be able to contribute, thinks about, 'Is this new thing more important than the current thing everyone on my team is doing?'" If the manager can spare anybody, they'll "contribute" their direct reports to the new priority. "Llama Nemotron ended up being a very cross-discipline, cross-team effort," Cohen added. "We had people from across the whole company working together without any formal organizational structure." Llama Nemotron required a series of sacrifices, Cohen said, both in terms of computing power and personnel — but people were able to set aside self-interests for the benefit of the whole. "It was really great to see, great leadership," he said. "There were a lot of sacrifices that people made, a lot of very egoless decisions that brought it together, which is just awesome." Nvidia did not respond to a request for comment by Business Insider immediately prior to publication. Read the original article on Business Insider Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Nvidia vice president says GPUs are the 'currency' of AI researchers
Nvidia vice president says GPUs are the 'currency' of AI researchers

Business Insider

time10-05-2025

  • Business
  • Business Insider

Nvidia vice president says GPUs are the 'currency' of AI researchers

In the world of AI research, the speed of development is limited in large part by available computing resources, according to Jonathan Cohen, Vice President of Applied Research at Nvidia. "These days, the currency in any AI researcher is how many GPUs they get access to, and that's no less true at Nvidia than at any other company," Cohen said in an interview on Nvidia Developer. Cohen led the team responsible for developing Nvidia's Llama Nemotron family of models. Released in March of this year, they represent the company's entry into the world of " reasoning" AI systems. The speed at which the models came together was remarkable, Cohen said, taking "no more than one to two months." He partially credits the efficiency of their development to other workers being willing to sacrifice their processing power. "So, there were a lot of researchers who very selflessly agreed to give up their compute so that we could get these Llama Nemotron models trained as quickly as we did," he said. Cohen also attributed the speed of development to Nvidia's company-wide culture of prioritizing major projects, regardless of current team goals. "How do you have a team to do a thing you've never done before? Part of the corporate culture is — we call them a 'swarm' — where you identify, 'This is something that's important,'" he said. "And everyone, every manager who has people who might be able to contribute, thinks about, 'Is this new thing more important than the current thing everyone on my team is doing?'" If the manager can spare anybody, they'll "contribute" their direct reports to the new priority. "Llama Nemotron ended up being a very cross-discipline, cross-team effort," Cohen added. "We had people from across the whole company working together without any formal organizational structure." Llama Nemotron required a series of sacrifices, Cohen said, both in terms of computing power and personnel — but people were able to set aside self-interests for the benefit of the whole. "It was really great to see, great leadership," he said. "There were a lot of sacrifices that people made, a lot of very egoless decisions that brought it together, which is just awesome." Nvidia did not respond to a request for comment by Business Insider immediately prior to publication.

Nvidia mobilizes IT partners to spread the AI gospel
Nvidia mobilizes IT partners to spread the AI gospel

Yahoo

time21-03-2025

  • Business
  • Yahoo

Nvidia mobilizes IT partners to spread the AI gospel

This story was originally published on CIO Dive. To receive daily news and insights, subscribe to our free daily CIO Dive newsletter. Nvidia CEO Jensen Huang enjoyed a victory lap Tuesday as the company convened its annual GTC conference. Of 25,000 in-person attendees, a select few were representatives of Nvidia's growing ecosystem of technology partners. The GPU giant is leaning on allied providers to help spread AI usage and hardware consumption across industries. 'We have a whole lineup for enterprise now,' Huang said. 'These will be available from all of our partners.' In the race to train more capable large language models and deploy smarter generative AI technologies, Nvidia has been a clear winner. The company's quarterly revenues skyrocketed over the last two years, surpassing in its most recently completed quarter the $27 billion reported for its full 2023 fiscal year. The rapid ascent 'speaks to how pervasive generative AI is becoming,' Forrester Senior Analyst Alvin Nguyen said. 'Everybody's wishing they could be in Nvidia's shoes at this point.' Several key technology providers tightened their orbit around Nvidia Tuesday. Accenture rolled out an agent-building platform built on Nvidia technology and said it would leverage the chipmaker's new family of Llama Nemotron reasoning models to develop up to 100 industry-specific agents this year. Nvidia partnered with Accenture, Deloitte, Microsoft, SAP and several other companies to deploy the models. 'Accenture has partnered with Nvidia since 2018 but our relationship has certainly accelerated over the past year due to the demand for generative AI in the enterprise and Nvidia's growing enterprise software capabilities,' Tom Stuermer, senior managing director and lead of the Accenture Nvidia business group, said in an email. Oracle also jumped on the agentic bandwagon. The cloud and software provider deployed an Nvidia microservices software integration to help enterprises build cloud-based agentic applications Tuesday. Oracle and Nvidia are collaborating on deployment blueprints to help enterprises provision cloud compute services for AI workloads, the companies said in a joint announcement. IBM fired up Nvidia H200 GPUs in its cloud and said it plans to integrate its watsonx AI platform with Nvidia microservices Tuesday. IBM is one of several technology providers signed on to deploy a GPU-powered Nvidia data platform with AI query agents, according to a separate announcement. 'All these partnership announcements are carrying the message of what Nvidia does well out to enterprises' Nguyen said. 'They're speaking in the language that enterprises want to hear in order to make that new technology palatable for enterprise.' As big tech turns to task-specific autonomous agents to help sell enterprises on generative AI's business value, Nvidia is eying its next revenue stream. 'The amount of computation we need at this point as a result of agentic AI, as a result of reasoning, is easily 100 times more than we thought we needed this time last year,' Huang said Tuesday. 'I expect data center buildouts to reach a trillion dollars, and I am fairly certain we're going to reach that very soon.' Analysts agree. Dell'Oro anticipates AI consumption will drive data center capital expenditures beyond $1 trillion annually by 2029, as hyperscalers continue pouring massive amounts of capital into capacity buildouts. In addition to supplying cloud providers with GPUs, Nvidia is pursuing a growing segment of AI revenues generated by enterprises that aren't part of the company's traditional customer base — including the finance and healthcare industries, retail and manufacturing. 'Nvidia's growing role in the enterprise IT space is a paradigm shift, signaling the importance of AI as an integral part of any IT strategy,' Scott Bickley, advisory fellow at Info-Tech Research Group, said in an email. 'Nvidia has migrated from being a hardware provider to an AI enabler, driving enterprise transformation.' Partners like Accenture and IBM are crucial enterprise connection points for Nvidia. They represent a pathway into the enterprise space for a company that just a few years ago was best known for its gaming chips. 'Enterprises and folks that serve enterprises, like IBM and Accenture, are very focused on what AI is going to do for the lives of their companies,' Jack Gold, founder and principal analyst at Associates, told CIO Dive. 'These partners are working with big IT shops, and they know that they are going to have to deploy AI capabilities, not just in the cloud, but also within their services and on-prem.' Nvidia's fortunes remain tethered to the big cloud providers and the AI compute resources they continue to stockpile. AWS, Microsoft and Google Cloud have already signaled their intentions to spend heavily on AI infrastructure through the end of the year. Oracle said earlier this month it plans to double its capital investments. As enterprise AI strategies mature and agentic capabilities gain traction, Nvidia hardware will land in enterprise data centers and private clouds, too. 'We're moving from an era of mostly AI in AWS, Azure and Google Cloud to an era where more and more companies are going to bring AI servers in-house,' Gold said. 'It might not say Nvidia on the box, but you're going to have increasing amounts of Nvidia hardware installed mostly as part of somebody else's systems.' Sign in to access your portfolio

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store