
Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Meta announced today a partnership with Cerebras Systems to power its new Llama API, offering developers access to inference speeds up to 18 times faster than traditional GPU-based solutions.
The announcement, made at Meta's inaugural LlamaCon developer conference in Menlo Park, positions the company to compete directly with OpenAI, Anthropic, and Google in the rapidly growing AI inference service market, where developers purchase tokens by the billions to power their applications.
'Meta has selected Cerebras to collaborate to deliver the ultra-fast inference that they need to serve developers through their new Llama API,' said Julie Shin Choi, chief marketing officer at Cerebras, during a press briefing. 'We at Cerebras are really, really excited to announce our first CSP hyperscaler partnership to deliver ultra-fast inference to all developers.'
The partnership marks Meta's formal entry into the business of selling AI computation, transforming its popular open-source Llama models into a commercial service. While Meta's Llama models have accumulated over one billion downloads, until now the company had not offered a first-party cloud infrastructure for developers to build applications with them.
'This is very exciting, even without talking about Cerebras specifically,' said James Wang, a senior executive at Cerebras. 'OpenAI, Anthropic, Google — they've built an entire new AI business from scratch, which is the AI inference business. Developers who are building AI apps will buy tokens by the millions, by the billions sometimes. And these are just like the new compute instructions that people need to build AI applications.'
A benchmark chart shows Cerebras processing Llama 4 at 2,648 tokens per second, dramatically outpacing competitors SambaNova (747), Groq (600) and GPU-based services from Google and others — explaining Meta's hardware choice for its new API. (Credit: Cerebras)
What sets Meta's offering apart is the dramatic speed increase provided by Cerebras' specialized AI chips. The Cerebras system delivers over 2,600 tokens per second for Llama 4 Scout, compared to approximately 130 tokens per second for ChatGPT and around 25 tokens per second for DeepSeek, according to benchmarks from Artificial Analysis.
'If you just compare on API-to-API basis, Gemini and GPT, they're all great models, but they all run at GPU speeds, which is roughly about 100 tokens per second,' Wang explained. 'And 100 tokens per second is okay for chat, but it's very slow for reasoning. It's very slow for agents. And people are struggling with that today.'
This speed advantage enables entirely new categories of applications that were previously impractical, including real-time agents, conversational low-latency voice systems, interactive code generation, and instant multi-step reasoning — all of which require chaining multiple large language model calls that can now be completed in seconds rather than minutes.
The Llama API represents a significant shift in Meta's AI strategy, transitioning from primarily being a model provider to becoming a full-service AI infrastructure company. By offering an API service, Meta is creating a revenue stream from its AI investments while maintaining its commitment to open models.
'Meta is now in the business of selling tokens, and it's great for the American kind of AI ecosystem,' Wang noted during the press conference. 'They bring a lot to the table.'
The API will offer tools for fine-tuning and evaluation, starting with Llama 3.3 8B model, allowing developers to generate data, train on it, and test the quality of their custom models. Meta emphasizes that it won't use customer data to train its own models, and models built using the Llama API can be transferred to other hosts—a clear differentiation from some competitors' more closed approaches.
Cerebras will power Meta's new service through its network of data centers located throughout North America, including facilities in Dallas, Oklahoma, Minnesota, Montreal, and California.
'All of our data centers that serve inference are in North America at this time,' Choi explained. 'We will be serving Meta with the full capacity of Cerebras. The workload will be balanced across all of these different data centers.'
The business arrangement follows what Choi described as 'the classic compute provider to a hyperscaler' model, similar to how Nvidia provides hardware to major cloud providers. 'They are reserving blocks of our compute that they can serve their developer population,' she said.
Beyond Cerebras, Meta has also announced a partnership with Groq to provide fast inference options, giving developers multiple high-performance alternatives beyond traditional GPU-based inference.
Meta's entry into the inference API market with superior performance metrics could potentially disrupt the established order dominated by OpenAI, Google, and Anthropic. By combining the popularity of its open-source models with dramatically faster inference capabilities, Meta is positioning itself as a formidable competitor in the commercial AI space.
'Meta is in a unique position with 3 billion users, hyper-scale datacenters, and a huge developer ecosystem,' according to Cerebras' presentation materials. The integration of Cerebras technology 'helps Meta leapfrog OpenAI and Google in performance by approximately 20x.'
For Cerebras, this partnership represents a major milestone and validation of its specialized AI hardware approach. 'We have been building this wafer-scale engine for years, and we always knew that the technology's first rate, but ultimately it has to end up as part of someone else's hyperscale cloud. That was the final target from a commercial strategy perspective, and we have finally reached that milestone,' Wang said.
Read More Nazara integrates with ONDC Network to launch gCommerce in India
The Llama API is currently available as a limited preview, with Meta planning a broader rollout in the coming weeks and months. Developers interested in accessing the ultra-fast Llama 4 inference can request early access by selecting Cerebras from the model options within the Llama API.
'If you imagine a developer who doesn't know anything about Cerebras because we're a relatively small company, they can just click two buttons on Meta's standard software SDK, generate an API key, select the Cerebras flag, and then all of a sudden, their tokens are being processed on a giant wafer-scale engine,' Wang explained. 'That kind of having us be on the back end of Meta's whole developer ecosystem is just tremendous for us.'
Meta's choice of specialized silicon signals something profound: in the next phase of AI, it's not just what your models know, but how quickly they can think it. In that future, speed isn't just a feature—it's the whole point.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
28 minutes ago
- Yahoo
AI-driven search ad spending set to surge to $26 billion by 2029, data shows
By Jaspreet Singh (Reuters) -Spending on AI-powered search advertising is poised to surge to nearly $26 billion by 2029 from just over $1 billion this year in the U.S., driven by rapid adoption of the technology and more sophisticated user targeting, data from Emarketer showed on Wednesday. Companies that rely on traditional keyword-based search ads could experience revenue declines due to the growing popularity of AI search ads, which offer greater convenience and engagement for users, according to the research firm. WHY IT'S IMPORTANT Search giants such as Alphabet-owned Google and Microsoft's Bing have added AI capabilities to better compete with chatbots such as OpenAI's ChatGPT and Perplexity AI, which provide users with direct information without requiring to click through multiple results. Apple is exploring the integration of AI-driven search capabilities into its Safari browser, potentially moving away from its longstanding partnership with Google. The report has come as concerns grew about users increasingly turning to the chatbots for conversational search and AI-powered search results could upend business models of some companies. Online education firm Chegg said in May that it would lay off about 248 employees as it looks to cut costs and streamline operations because students are using AI-powered tools including ChatGPT over traditional edtech platforms. QUOTE "Publishers and other sites are feeling the pain from AI search. As they lose out on traffic, we're seeing publishers lean into subscriptions and paid AI licensing deals to bolster revenue," Emarketer analyst Minda Smiley said. GRAPHIC CONTEXT AI search ad spending is expected to constitute nearly 1% of total search ad spending this year and 13.6% by 2029 in the U.S., according to Emarketer. Sectors such as financial services, technology, telecom, and healthcare are embracing AI as they are seeing clear advantages in using the technology to enhance their ad strategies, while the retail industry's adoption is slow, the report said. Google recently announced the expansion of its AI-powered search capabilities into the consumer packaged goods sector through enhancements in Google Shopping. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Geek Wire
an hour ago
- Geek Wire
Seattle startup backed by former Google CEO lands $16M to automate repetitive tasks on a computer
GeekWire's startup coverage documents the Pacific Northwest entrepreneurial scene. Sign up for our weekly startup newsletter , and check out the GeekWire funding tracker and venture capital directory . Vercept team members, from left: Matt Deitke, co-founder; Kiana Ehsani, CEO; Ross Girshick, co-founder; Cam Sloan, member of technical staff; Kuo-Hao Zeng, founding researcher; Harshitha Rebala, member of technical staff; Eric Kolve, founding engineer; and Luca Weihs, co-founder. Not pictured: Oren Etzioni, co-founder. (Vercept Photo) Vercept revealed this week that it raised a $16 million seed round in January from venture capital firms and prominent tech leaders to build what it describes as the 'computer interface of the future.' The Seattle-based startup, founded last year by a group of former Allen Institute for AI (Ai2) research leaders, has some big name backers including Eric Schmidt, former CEO and chairman at Google; Jeff Dean, chief scientist of Google DeepMind; Kyle Vogt, founder and former CEO of Cruise; Arash Ferdowsi, co-founder of Dropbox; and other longtime tech vets. San Francisco-based Fifty Years led the seed round, which also included Point Nine and Seattle-based AI2 Incubator, the company's first institutional investor. GeekWire first covered Vercept in February when it emerged from stealth mode. Vercept last month revealed Vy, its Mac app that 'sees' and understands computer screens like a human would. It records a user performing tasks across different software or websites — and then autonomously runs the same workflow from a natural language command. The idea is to use AI to automate repetitive tasks, like entering data, producing video content, organizing invoices, and more. Vercept is similar to so-called robotic process automation (RPA) companies such as UiPath and Automation Anywhere, which deploy software robots that mimic human actions. But the startup is 'fundamentally different,' said Vercept CEO and co-founder Kiana Ehsani, who described the product as a 'unified paradigm for interacting with the computer.' 'Unlike traditional RPA solutions, Vy doesn't require hardcoded interactions, pre-built connectors, or APIs to engage with new software,' Ehsani said. 'Whether dealing with legacy applications that lack APIs or modern web platforms, Vy's form of interaction remains consistent, intuitive, and flexible.' OpenAI (Operator), Google (Project Mariner), Amazon (Nova Act), and others recently released tools that automate tasks across browsers and apps, fueled by advances in generative AI. Vercept is building its own model called VyUI, which powers its soon-to-be-released API. 'We envision developers using our API to build a wide range of products and applications, for example: automatic UI test suites, computer and web use agents, RPA solutions, and so on,' the company says on its website. Vercept says VyUI beats competitors on various benchmarks. Ehsani didn't share user growth metrics or revenue data, but said the reception to Vy has exceeded expectations. 'Our user community is wonderfully diverse, from individuals with disabilities integrating their own speech-to-text systems to remotely control their computers, to students leveraging Vy to streamline their homework tasks, to businesses using Vy to automate their workflows,' she told GeekWire. Ehsani previously oversaw the Ai2 robotics and embodied artificial intelligence teams as a senior researcher. Others on the Vercept founding team include: Oren Etzioni, who was the founding CEO of AI2 before stepping down in 2022. Matt Deitke, who led the development of Ai2 research projects including Molmo, ProcTHOR, and Objaverse. Luca Weihs, previously Ai2 research manager and infrastructure team lead, working in areas including AI agents and reinforcement learning. Ross Girshick, a pioneer in the combination of computer vision and deep learning, and a former research scientist at Meta AI and Ai2. Vercept has eight full-time employees. The company was spotlighted in a recent Startup Radar post on GeekWire.


Business Wire
an hour ago
- Business Wire
HubSpot Launches First CRM Deep Research Connector With ChatGPT
BOSTON--(BUSINESS WIRE)--More than 250,000 businesses rely on HubSpot as their single source of truth for customer data across marketing, sales, and service. This complete view of the customer journey gives our customers an edge, especially in an era where AI is only as powerful as the data behind it. Today, we're excited to announce that HubSpot is the first CRM to launch a deep research connector with ChatGPT. With over 75% of HubSpot customers already using ChatGPT*, we're making it easy for them to apply powerful, doctorate-level research and analysis to their own customer data and context–and to put those business insights to work. This is game-changing for go-to-market teams. Within ChatGPT, for example: Marketers can ask 'find my highest-converting cohorts from recent contacts and create a tailored nurture sequence to boost engagement,' then use the insights to launch an automated workflow in HubSpot. Sales teams can find new opportunities by asking, 'segment my target companies by annual revenue, industry, and technology stack. Based on that, identify the top opportunities for enterprise expansion,' then bring them back to HubSpot for prospecting. Customer success teams can say, 'identify inactive companies with growth potential and generate targeted plays to re-engage and revive pipeline,' then take those actions in HubSpot to drive retention. Support teams can say, 'analyze seasonal patterns in ticket volume by category to forecast support team staffing needs for the upcoming quarter,' and activate Breeze Customer Agent in HubSpot to handle spikes in support tickets. 'Launching the HubSpot deep research connector means businesses and their employees get faster, better insights because ChatGPT has more context. We're thrilled to work together to bring powerful AI to many of today's most important workflows.' - Nate Gonzalez, Head of Business Products at OpenAI. 'The HubSpot connector is like having an extra analyst on the team, empowering sales reps to identify risks, opportunities, and next best actions,' said Colin Johnson, Senior Manager, CRM at Youth Enrichment Brands. 'For a non-technical user, the fact that it's easy to use and talks directly to my data is huge.' 'We're building tools that help businesses lead through the AI shift, not just adapt to it,' said Karen Ng, SVP of Product and Partnerships at HubSpot. 'By connecting HubSpot CRM data directly to ChatGPT, even small teams without time or data resources can run deep analysis and take action on those insights — fueling better outcomes across marketing, sales, and service.' Easy to use and easy to trust HubSpot customers who have admin controls can enable the connector for their organization by going to ChatGPT and turning on the HubSpot deep research connector function, selecting HubSpot as a data source, and authenticating their account. From there, any user in the organization can toggle it on, sign in, and start asking questions. In addition to being easy to use, the HubSpot deep research connector is also easy to trust. We built it to ensure users only see the CRM data they're allowed to access in HubSpot. For example, individual sales reps will only see pipeline data for deals they own or manage. With the HubSpot deep research connector, customer data is not used for AI training in ChatGPT. Availability The HubSpot deep research connector will automatically be available to all HubSpot customers across all tiers with a paid ChatGPT plan. (EU: Team, Enterprise, and Edu; all other regions: Team, Enterprise, Pro, Plus, and Edu). All available languages can be found here. *HubSpot 2025 Q1 AI customer sentiment survey About HubSpot HubSpot (NYSE: HUBS) is the leading AI-powered customer platform for growing businesses. The platform includes engagement hubs for marketing, sales, and customer service, a connected Smart CRM, and an ecosystem of over 1,800 integrations—all built on a unified data foundation that powers HubSpot's AI and enables smarter, faster, more personalized customer experiences. More than 250,000 customers across 135+ countries use HubSpot to unify their data, align their teams, and grow better. HubSpot is headquartered in Cambridge, Massachusetts. Learn more at