logo
#

Latest news with #Llama4

Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference
Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference

Business Wire

time15 hours ago

  • Business
  • Business Wire

Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference

SUNNYVALE, Calif.--(BUSINESS WIRE)--Last week, Nvidia announced that 8 Blackwell GPUs in a DGX B200 could demonstrate 1,000 tokens per second (TPS) per user on Meta's Llama 4 Maverick. Today, the same independent benchmark firm Artificial Analysis measured Cerebras at more than 2,500 TPS/user, more than doubling the performance of Nvidia's flagship solution. 'Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week. Artificial Analysis benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 t/s compared to NVIDIA Blackwell's 1,038 t/s for the same model." - Artificial Analysis Share 'Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week,' said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. 'Artificial Analysis has benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 tokens per second, compared to NVIDIA Blackwell's 1,038 tokens per second for the same model. We've tested dozens of vendors, and Cerebras is the only inference solution that outperforms Blackwell for Meta's flagship model.' With today's results, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest and most powerful in the Llama 4 family. Artificial Analysis tested multiple other vendors, and the results were as follows: SambaNova 794 t/s, Amazon 290 t/s, Groq 549 t/s, Google 125 t/s, and Microsoft Azure 54 t/s. Andrew Feldman, CEO of Cerebras Systems, said, 'The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency. These use cases often involve multi-step chains of thought or large-scale retrieval and planning, with generation speeds as low as 100 tokens per second on GPUs, causing wait times of minutes and making production deployment impractical. Cerebras has led the charge in redefining inference performance across models like Llama, DeepSeek, and Qwen, regularly delivering over 2,500 TPS/user.' With its world record performance, Cerebras is the optimal solution for Llama 4 in any deployment scenario. Not only is Cerebras Inference the first and only API to break the 2,500 TPS/user milestone on this model, but unlike the Nvidia Blackwell used in the Artificial Analysis benchmark, the Cerebras hardware and API are available now. Nvidia used custom software optimizations that are not available to most users. Interestingly, none of the Nvidia's inference providers offer a service at Nvidia's published performance. This suggests that in order to achieve 1000 TPS/user, Nvidia was forced to reduce throughput by going to batch size 1 or 2, leaving the GPUs at less than 1% utilization. Cerebras, on the other hand, achieved this record-breaking performance without any special kernel optimizations, and it will be available to everyone through Meta's API service coming soon. For cutting-edge AI applications such as reasoning, voice, and agentic workflows, speed is paramount. These AI applications gain intelligence by processing more tokens during the inference process. This can also make them slow and force customers to wait. And when customers are forced to wait, they leave and go to competitors who provide answers faster—a finding Google showed with search more than a decade ago. With record-breaking performance, Cerebras hardware and resulting API service is the best choice for developers and enterprise AI users around the world. For more information, please visit

Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference
Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference

Yahoo

time15 hours ago

  • Business
  • Yahoo

Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference

Cerebras Breaks the 2,500 Tokens Per Second Barrier with Llama 4 Maverick 400B SUNNYVALE, Calif., May 28, 2025--(BUSINESS WIRE)--Last week, Nvidia announced that 8 Blackwell GPUs in a DGX B200 could demonstrate 1,000 tokens per second (TPS) per user on Meta's Llama 4 Maverick. Today, the same independent benchmark firm Artificial Analysis measured Cerebras at more than 2,500 TPS/user, more than doubling the performance of Nvidia's flagship solution. "Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week," said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. "Artificial Analysis has benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 tokens per second, compared to NVIDIA Blackwell's 1,038 tokens per second for the same model. We've tested dozens of vendors, and Cerebras is the only inference solution that outperforms Blackwell for Meta's flagship model." With today's results, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest and most powerful in the Llama 4 family. Artificial Analysis tested multiple other vendors, and the results were as follows: SambaNova 794 t/s, Amazon 290 t/s, Groq 549 t/s, Google 125 t/s, and Microsoft Azure 54 t/s. Andrew Feldman, CEO of Cerebras Systems, said, "The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency. These use cases often involve multi-step chains of thought or large-scale retrieval and planning, with generation speeds as low as 100 tokens per second on GPUs, causing wait times of minutes and making production deployment impractical. Cerebras has led the charge in redefining inference performance across models like Llama, DeepSeek, and Qwen, regularly delivering over 2,500 TPS/user." With its world record performance, Cerebras is the optimal solution for Llama 4 in any deployment scenario. Not only is Cerebras Inference the first and only API to break the 2,500 TPS/user milestone on this model, but unlike the Nvidia Blackwell used in the Artificial Analysis benchmark, the Cerebras hardware and API are available now. Nvidia used custom software optimizations that are not available to most users. Interestingly, none of the Nvidia's inference providers offer a service at Nvidia's published performance. This suggests that in order to achieve 1000 TPS/user, Nvidia was forced to reduce throughput by going to batch size 1 or 2, leaving the GPUs at less than 1% utilization. Cerebras, on the other hand, achieved this record-breaking performance without any special kernel optimizations, and it will be available to everyone through Meta's API service coming soon. For cutting-edge AI applications such as reasoning, voice, and agentic workflows, speed is paramount. These AI applications gain intelligence by processing more tokens during the inference process. This can also make them slow and force customers to wait. And when customers are forced to wait, they leave and go to competitors who provide answers faster—a finding Google showed with search more than a decade ago. With record-breaking performance, Cerebras hardware and resulting API service is the best choice for developers and enterprise AI users around the world. For more information, please visit View source version on Contacts pr@

Meta's Llama AI team has been bleeding talent. Many top researchers have joined French AI startup Mistral.
Meta's Llama AI team has been bleeding talent. Many top researchers have joined French AI startup Mistral.

Business Insider

time3 days ago

  • Business
  • Business Insider

Meta's Llama AI team has been bleeding talent. Many top researchers have joined French AI startup Mistral.

Meta's open-source Llama models helped define the company's AI strategy. Yet the researchers who built the original version have mostly moved on. Of the 14 authors credited on the landmark 2023 paper that introduced Llama to the world, just three still work at Meta: research scientist Hugo Touvron, research engineer Xavier Martinet, and technical program leader Faisal Azhar. The rest have left the company, many of them to join or found its emerging rivals. Meta's brain drain is most visible at Mistral, the Paris-based startup co-founded by former Meta researchers Guillaume Lample and Timothée Lacroix, two of Llama's key architects. Alongside several fellow Meta alums, they're building powerful open-source models that directly compete with Meta's flagship AI efforts. The exits over time raise questions about Meta's ability to retain top AI talent just as it faces a new wave of external and internal pressure. The company is delaying its largest-ever AI model, Behemoth, after internal concerns about its performance and leadership, The Wall Street Journal reported. Llama 4, Meta's latest release, received a lukewarm reception from developers, many of whom now look to faster-moving open-source rivals like DeepSeek and Qwen for cutting-edge capabilities. Inside Meta, the research team has also seen a shake-up. Joelle Pineau, who led the company's Fundamental AI Research group (FAIR) for eight years, announced last month that she would step down. She will be replaced by Robert Fergus, who co-founded FAIR in 2014 and then spent five years at Google's DeepMind before rejoining Meta this month. The leadership reshuffle follows a period of quiet attrition. Many of the researchers behind Llama's initial success have left FAIR since publishing their landmark paper, even as Meta continues to position the model family as central to its AI strategy. With so many of its original architects gone and rivals moving faster in open-source innovation, Meta now faces the challenge of defending its early lead without the team that built it. That's particularly significant because the 2023 Llama paper was more than just a technical milestone. It helped legitimize open-weight large language models with underlying code and parameters that are freely available for others to use, modify, and build on, as viable alternatives to proprietary systems at the time, like OpenAI's GPT-3 and Google's PaLM. Meta trained its models using only publicly available data and optimized them for efficiency, enabling researchers and developers to run state-of-the-art systems on a single GPU chip. For a moment, Meta looked like it could lead the open frontier. Two years later, that lead has slipped, and Meta no longer sets the pace. Despite investing billions into AI, Meta still doesn't have a dedicated "reasoning" model, one built specifically to handle tasks that require multi-step thinking, problem-solving, or calling external tools to complete complex commands. That gap has grown more noticeable as other companies like Google and OpenAI prioritize these features in their latest models. The average tenure of the 11 departed authors at Meta was over five years, suggesting they weren't short-term hires but researchers deeply embedded in Meta's AI efforts. Some left as early as January 2023; others stayed through the Llama 3 cycle, and a few left as recently as this year. Together, their exits mark the quiet unraveling of the team that helped Meta stake its AI reputation on open models. A Meta spokesperson pointed to an X post about Llama research paper authors who have left. The list below, based on information from the researchers' LinkedIn profiles, shows where each of them ended up. Naman Goyal Left Meta: February 2025 Time at Meta: 6 years, 7 months Baptiste Rozière Current role: AI Scientist at Mistral Left Meta: August 2024 Time at Meta: 5 years, 1 month Aurélien Rodriguez Current role: Director, Foundation Model Training at Cohere Left Meta: July 2024 Time at Meta: 2 years, 7 months Eric Hambro Current role: Member of Technical Staff at Anthropic Left Meta: November 2023 Time at Meta: 3 years, 3 months Timothée Lacroix Left Meta: June 2023 Time at Meta: 8 years, 5 months Marie-Anne Lachaux Current role: Founding Member and AI Research Engineer at Mistral Left Meta: June 2023 Time at Meta: 5 years Thibaut Lavril Current role: AI Research Engineer at Mistral Left Meta: June 2023 Time at Meta: 4 years, 5 months Armand Joulin Current role: Distinguished Scientist at Google DeepMind Left Meta: May 2023 Time at Meta: 8 years, 8 months Gautier Izacard Current role: Technical Staff at Microsoft AI Left Meta: March 2023 Time at Meta: 3 years, 2 months Edouard Grave Current role: Research Scientist at Kyutai Left Meta: February 2023 Time at Meta: 7 years, 2 months Guillaume Lample Left Meta: Early 2023 Time at Meta: 7 years

Red Hat & Meta unite to drive open source AI for business
Red Hat & Meta unite to drive open source AI for business

Techday NZ

time21-05-2025

  • Business
  • Techday NZ

Red Hat & Meta unite to drive open source AI for business

Red Hat and Meta have announced a collaboration aimed at advancing open source generative artificial intelligence (AI) for enterprise use. The collaboration began with Red Hat enabling the Llama 4 model family from Meta on Red Hat AI and the vLLM inference server. This initial integration enables businesses to deploy generative AI applications and agents with a simplified process. Both companies plan to continue this effort by promoting the alignment of the Llama Stack and the vLLM community projects, with the goal of creating unified frameworks for open generative AI workloads. Red Hat and Meta indicated that they are championing open standards to ensure that generative AI applications operate efficiently across hybrid cloud environments, independent of specific hardware accelerators or computing environments. This direction is aimed at creating consistency and reducing costs in enterprise AI deployments. Mike Ferris, Senior Vice President and Chief Strategy Officer at Red Hat, stated: "Red Hat and Meta both recognize that AI's future success demands not only model advancements but also inference capabilities that let users maximize the breakthrough capabilities of next-generation models. Our joint commitment to Llama Stack and vLLM are intended to help realize a vision of faster, more consistent and more cost-effective gen AI applications running wherever needed across the hybrid cloud, regardless of accelerator or environment. This is the open future of AI, and one that Red Hat and Meta are ready to meet." According to Gartner, by 2026, over 80% of independent software vendors are expected to have embedded generative AI capabilities in their enterprise applications, compared to the less than 1% observed currently. Red Hat and Meta's collaboration addresses the need for open and interoperable foundations, particularly at the application programming interface (API) layer and within inference serving, which handles real-time operational AI workloads. Llama Stack, developed and released as open source by Meta, provides standardized building blocks and APIs for the full lifecycle of generative AI applications. Red Hat is actively contributing to the Llama Stack project, which the company expects will improve options for developers who are building agentic AI applications on Red Hat AI. Red Hat has committed to supporting a range of agentic frameworks, including Llama Stack, in order to offer customers flexibility in their tooling and development approaches. With these developments, Red Hat aims to create an environment that accelerates the development and deployment of next-generation AI solutions, which align with emerging technologies and methods in the sector. On the inference side, the vLLM project acts as an open source platform supporting efficient inference for large language models such as the Llama series. Red Hat has made leading contributions to vLLM, ensuring immediate support for Llama 4 models. Meta has pledged to increase its engagement with the vLLM community project, aiming to enhance its capabilities for cost-effective and scalable AI inference. The project is also part of the PyTorch ecosystem, which Meta and others support, contributing to an inclusive AI tools environment. Ash Jhaveri, Vice President of AI and Reality Labs Partnerships at Meta, said: "We are excited to partner with Red Hat as we work towards establishing Llama Stack as the industry standard for seamlessly building and deploying generative AI applications. This collaboration underscores our commitment to open innovation and the development of robust, scalable AI solutions that empower businesses to harness the full potential of AI technology. Together with Red Hat, we are paving the way for a future where Llama models and tools become the backbone of enterprise AI, driving efficiency and innovation across industries." The collaboration formalises the intent of both companies to bolster open source AI foundations, facilitate interoperability, and expand choice for enterprise customers in building and deploying generative AI solutions across various computing environments.

Red Hat and Meta Collaborate to Advance Open Source AI for Enterprise
Red Hat and Meta Collaborate to Advance Open Source AI for Enterprise

Business Wire

time20-05-2025

  • Business
  • Business Wire

Red Hat and Meta Collaborate to Advance Open Source AI for Enterprise

BOSTON – RED HAT SUMMIT--(BUSINESS WIRE)--Red Hat, the world's leading provider of open source solutions, and Meta today announced a new collaboration to spur the evolution of generative AI (gen AI) for the enterprise. This collaboration started with Red Hat's day 0 enablement of the groundbreaking Llama 4 model family on Red Hat AI and the high-performing vLLM inference server. Building on this momentum, Red Hat and Meta will also champion the alignment of the Llama Stack and the vLLM community projects, helping to drive unified frameworks for the democratization and simplification of open gen AI workloads. Red Hat and Meta collaborate around open source enterprise AI with vLLM and Llama Stack Share According to Gartner 1, 'by 2026, more than 80% of independent software vendors (ISVs) will have embedded generative AI capabilities in their enterprise applications, up from less than 1% today.' This underscores the urgent need for the open, interoperable foundations that Red Hat and Meta are pioneering. The companies' collaboration directly addresses the critical requirement for more seamless gen AI workload functionality across diverse platforms, clouds and AI accelerators, particularly at the crucial application programming interface (API) layer and within the 'doing' phase of AI — inference serving. Red Hat and Meta's deep commitment to open innovation is evident in their roles as primary commercial contributors to foundational projects: Llama Stack, developed and open-sourced by Meta, delivers standardized building blocks and APIs to revolutionize the entire gen AI application lifecycle; and vLLM, where Red Hat's leading contributions are powering an open source platform that enables highly efficient and optimized inference for large language models (LLMs), including Day 0 support for Llama 4. Creating common foundations and open choice for gen AI apps As part of this collaboration, Red Hat is actively contributing to the Llama Stack project, helping further enhance its capabilities as a compelling choice for developers building innovative, agentic AI applications on Red Hat AI. With Red Hat AI, Red Hat maintains a commitment to supporting a diverse range of agentic frameworks, including Llama Stack, fostering customer choice in tooling and innovation. This enablement aims to provide a robust and adaptable environment to accelerate the development and deployment of next-generation AI solutions, a wave that embraces the evolving landscape of agentic technologies. Trailblazing the future of AI inference with vLLM The vLLM project, already pushing the boundaries of efficient and cost-effective open gen AI, gains further momentum with Meta's commitment to deepen community contributions. This collaboration gives vLLM the capacity to provide Day 0 support for the latest generations of the Llama model family, starting with Llama 4. vLLM is also part of the PyTorch Ecosystem where Meta and others collaborate to foster an open and inclusive tools ecosystem. This validation positions vLLM at the forefront of unlocking gen AI value in the enterprise. Red Hat Summit Join the Red Hat Summit keynotes to hear the latest from Red Hat executives, customers and partners: Modernized infrastructure meets enterprise-ready AI — Tuesday, May 20, 8-10 a.m. EDT (YouTube) Hybrid cloud evolves to deliver enterprise innovation — Wednesday, May 21, 8-9:30 a.m. EDT (YouTube) Supporting Quotes Mike Ferris, senior vice president and chief strategy officer, Red Hat 'Red Hat and Meta both recognize that AI's future success demands not only model advancements but also inference capabilities that let users maximize the breakthrough capabilities of next-generation models. Our joint commitment to Llama Stack and vLLM are intended to help realize a vision of faster, more consistent and more cost-effective gen AI applications running wherever needed across the hybrid cloud, regardless of accelerator or environment. This is the open future of AI, and one that Red Hat and Meta are ready to meet.' Ash Jhaveri, vice president, AI and Reality Labs Partnerships, Meta "We are excited to partner with Red Hat as we work towards establishing Llama Stack as the industry standard for seamlessly building and deploying generative AI applications. This collaboration underscores our commitment to open innovation and the development of robust, scalable AI solutions that empower businesses to harness the full potential of AI technology. Together with Red Hat, we are paving the way for a future where Llama models and tools become the backbone of enterprise AI, driving efficiency and innovation across industries." Additional Resources Connect with Red Hat About Red Hat Red Hat is the world's leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future. Forward-Looking Statements Except for the historical information and discussions contained herein, statements contained in this press release may constitute forward-looking statements within the meaning of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are based on the company's current assumptions regarding future business and financial performance. These statements involve a number of risks, uncertainties and other factors that could cause actual results to differ materially. Any forward-looking statement in this press release speaks only as of the date on which it is made. Except as required by law, the company assumes no obligation to update or revise any forward-looking statements. Red Hat and the Red Hat logo are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the U.S. and other countries. 1 Gartner, 2025 TSP Planning Trends: Managing the GenAI Inference 'Tax,' 2 September 2024, ID G00818892, John Lovelock and Mark McDonald

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store