Latest news with #AndrewFeldman


Business Wire
6 days ago
- Business
- Business Wire
Cerebras Helps Power OpenAI's Open Model at World-Record Inference Speeds: gpt-oss-120B Delivers Frontier Reasoning for All
SUNNYVALE, Calif. & SAN FRANCISCO--(BUSINESS WIRE)--Cerebras Systems today announced inference support for gpt-oss-120B, OpenAI's first open-weight reasoning model, now running at record-breaking inference speeds on the Cerebras AI Inference Cloud. Purpose-built for complex challenges in math, science, and code, this 120B-parameter model achieves intelligence on par with top proprietary models like Gemini 2.5 Flash and Claude Opus 4—while delivering unmatched speed, cost efficiency, and openness. "Through deployment partners like Cerebras, we're together able to provide powerful, flexible tools that make it easier than ever to build, innovate, and scale," said Dmitry Pimenov, product lead at OpenAI. Share For the first time, an OpenAI model leverages Cerebras' wafer-scale AI infrastructure to run full-model inference. By eliminating GPU memory bandwidth bottlenecks and communication overhead, Cerebras wafer-scale AI inference delivered a world-record 3,000 tokens per second output speed — a major advance in responsiveness for high-intelligence AI. 'OpenAI's open-weight reasoning model release is a defining moment for the AI community,' said Andrew Feldman, CEO and co-founder of Cerebras. 'With gpt-oss-120B, we're not just breaking speed records—we're redefining what's possible. OpenAI on Cerebras delivers frontier intelligence with blistering performance, lower cost, full openness, and plug-and-play ease of use. It's the ultimate AI platform: smart, fast, affordable, easy to use, and fully open.' At over 3,000 tokens/second, organizations will be able to use Cerebras-powered gpt-oss-120B to build live coding assistants, instant large document Q&A and summarization, and fast agentic research chains. These high-intelligence AI reasoning use cases have long wait times on proprietary models running on GPUs – that lag is now dramatically reduced with gpt-oss-120B on Cerebras. Developers can swap their existing OpenAI endpoints for Cerebras in 15 seconds. No refactoring. No migration headaches. Just instant access to the highest performance and quality gpt-oss-120B models running on the Cerebras Cloud. The open-weight Apache 2.0 license from OpenAI gives users full control to fine-tune for their domain, deploy on-prem for sensitive or regulated data, or move freely across clouds. "Our open models let developers—from solo builders to large enterprise teams—run and customize AI on their own infrastructure, unlocking new possibilities across industries and use cases," said Dmitry Pimenov, product lead at OpenAI. "Through deployment partners like Cerebras, we're together able to provide powerful, flexible tools that make it easier than ever to build, innovate, and scale." Experience the fastest AI inference today: Developers and enterprises can now access gpt-oss-120B on the Cerebras Cloud with a free API key ( About Cerebras Systems Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to accelerate generative AI by building from the ground up a new class of AI supercomputer. Our flagship product, the CS-3 system, is powered by the world's largest and fastest commercially available AI processor, our Wafer-Scale Engine-3. CS-3s are quickly and easily clustered together to make the largest AI supercomputers in the world, and make placing models on the supercomputers dead simple by avoiding the complexity of distributed computing. Cerebras Inference delivers breakthrough inference speeds, empowering customers to create cutting-edge AI applications. Leading corporations, research institutions, and governments use Cerebras solutions for the development of pathbreaking proprietary models, and to train open-source models with millions of downloads. Cerebras solutions are available through the Cerebras Cloud and on-premises. For further information, visit or follow us on LinkedIn, X and/or Threads.


The Guardian
02-08-2025
- Business
- The Guardian
VIP contract introduced by Tory peer left government owed £24m
They were the lucrative deals that epitomised the 'VIP lane' set up by Boris Johnson's government during the Covid pandemic, which gave priority for personal protective equipment (PPE) contracts to people with political connections. Peter Gummer, a former PR boss who has been Tory peer Lord Chadlington since 1996, had smooth access at his fingertips. The erstwhile adviser to John Major has 'close personal friendships with many senior Conservative party politicians', he has said, and as president of the Witney constituency association in the Cotswolds is 'close friends' with its most notable MP: David Cameron. When the pandemic reached Britain early in 2020, Chadlington was a director and shareholder of a company registered in Jersey, majority-owned and run by David Sumner, a serial entrepreneur then based in Dubai. During the first lockdown, Chadlington embarked on an effort to introduce a Sumner company to supply PPE. He contacted Cameron first, texting him at 7.13am on 19 April 2020. Cameron texted back that his own close friend Andrew Feldman, whom he had appointed to the House of Lords when he was prime minister, was working for the government on PPE procurement. Chadlington then texted Feldman, who said to get in touch on his new Department of Health and Social Care email address. By 8.04am, less than an hour after his initial text to Cameron, Chadlington was emailing Feldman, copying in Sumner, to make a direct introduction. 'David,' Chadlington wrote, addressing Sumner. 'This is my friend Andrew Feldman. He can help you with PPE we discussed this morning. Drop me off chain. Peter.' Sumner then sent Feldman an offer to supply PPE. Feldman forwarded it to civil servants operating the VIP lane, telling them: 'An interesting offer from David Sumner who was introduced to me by Lord Chadlington.' A week later, the DHSC awarded Sumner's UK company – a small, loss-making healthcare staff agency, SG Recruitment – a £24m contract to supply coveralls. A month later the government gave the company a second contract, worth £26m, to supply hand sanitiser. The full story of how this £50m example of the government's VIP lane commissioning turned out is only becoming clearer now. It was far from a success. Despite the government contracts, SG Recruitment, later renamed, went into liquidation in December 2023. The liquidators have just issued their official report. It confirmed for the first time that the DHSC entirely rejected as 'unusable' the PPE supplied under the first contract, and has put in a claim for the full £24m. However, all the money has gone from the company. It went bust owing £1.1m in taxes to HMRC. The liquidators found that payments were made out of the company to connected businesses and 'unknown third parties', some apparently overseas. The main bank account was closed in December 2021. Sumner transferred ownership of the company in March 2023 to a woman the liquidators said they believed to be a Philippine national, whom they were unable to contact. The sum total of money found was in an overseas transfers account: £5. The parent company, Sumner Group Holdings (SGH), of which Chadlington was a director then chair, went into liquidation earlier, in 2022. Chadlington's lawyers told the Guardian in June 2023 that when he put the Sumner companies forward for PPE contracts, he was unaware they had financial difficulties. 'He had no information which gave rise to financial concerns regarding SGH and/or SGR in April 2020,' they said. But the Guardian has seen evidence that indicates Chadlington was made aware in January 2020 that some SGH creditors were pushing the company, and Sumner personally, to repay at least $18m, claiming that they were owed in total approximately $30m at that time. There is also publicly revealed evidence that the Sumner companies were under financial pressure before Chadlington made his introduction. SGH was put into liquidation by a consultant, Douglas Geertz, whose fees were never paid. A 2022 court judgment noted that SGH's chief operating officer told him in the summer of 2018: 'I regret that SGH finds itself in very challenging financial circumstances.' In December 2019 other creditors had sued another Sumner group company in the British Virgin Islands for £2m. A published court judgment noted that: 'Relations [with these creditors] deteriorated in late 2018 and collapsed completely in September 2019.' The £2m appears never to have been paid. SG Recruitment, the UK company, had made losses of £700,000, and was financially reliant on SGH, in the year before it was awarded the PPE contracts. Despite all this, Chadlington used his connections to contact senior Conservative figures from April 2020, including Matt Hancock, then the health secretary, promoting Sumner as a supplier of PPE. Chadlington encouraged Sumner to secure contracts, messaging him after the approach to Feldman with 'Brilliant. Keep going' and 'Excellent. Looks like you have an inside track.' Chadlington has twice been investigated by the House of Lords commissioner for standards for his approach to Feldman, under a section of the conduct code that says peers 'must not seek to profit from membership of the House'. Chadlington did not tell either investigation that he sent the email introducing Sumner to Feldman. He told the commissioner's second inquiry in August 2023: 'I did not facilitate an introduction.' He was cleared of any misconduct both times. Chadlington has now disclosed that email introduction and the text messages he exchanged with Cameron, Feldman, Hancock and Sumner, after he was asked to provide evidence to the Covid-19 public inquiry. It published the messages and his witness statement in May this year. Chaired by Heather Hallett, the inquiry is considering the VIP lane as part of its examination into how the government handled the pandemic. The Covid Bereaved Families for Justice (CBFFJ) group, which represents relatives of 7,500 people who died of Covid, is an inquiry core participant. The CBFFJ's lead lawyer, Pete Weatherby KC, has been very critical of the VIP lane and highlighted SG Recruitment as a key example. The inquiry is looking at how Chadlington's introduction affected the contracts being awarded, but not at whether the PPE was ultimately delivered. Johnson's government's operation of the VIP lane has been widely criticised for prioritising politically connected companies ever since its existence was leaked in October 2020. As the government frantically scrambled to fill depleted stockpiles, it spent £12bn on PPE in 2020-21, of which almost £9bn had to be written off because it was substandard, defective, past its use-by date or overpriced. The UK Anti-Corruption Coalition has pointed to evidence that VIP lane contracts cost £3.8bn, almost 30% of the total, and delivered more expensive and more unusable PPE than non-VIP contracts. Hancock and other ministers have defended the VIP lane, arguing that it enabled the government to prioritise credible offers. Chadlington was a director of SGH from 2018, then was appointed chair in April 2020, starting the role in June 2020, his lawyers said. In his statement to the Covid inquiry, Chadlington said that it was later, 'within a few months' of becoming chair, that he and his fellow SGH directors did become 'increasingly concerned' about the way the business was being run. 'We were concerned about unpaid wages and fees, contracts falling through, and increasingly vague and confusing responses from Mr Sumner and the management team to questions from the SGH board, as well as Mr Sumner's apparent overoptimism about the state of the business.' Despite this, as late as April 2021, Chadlington was enthusiastically supporting Sumner to bid for a further government PPE contract, telling a fellow director that Sumner was being asked to quote 'on a huge (and I mean huge!) contract' to supply gloves. That showed that the government 'have no doubts about our legitimacy', he wrote to the director. The same day he messaged Sumner, encouraging him to publicly market the company as a government supplier. Ultimately the company was accepted as a possible supplier of gloves, but no contracts were awarded. Chadlington has repeatedly said he did not personally benefit from the PPE contracts, telling the Lords commissioner's first inquiry in 2022: 'I received no commission, bonus or direct financial benefit from the two contracts awarded to SG Recruitment Limited.' The Guardian has seen SGH filings in Jersey showing that in the year after the award of the contracts, Chadlington was issued with 27.5m new shares in the company, which he appears not to have paid for. Chadlington's lawyers said the shares were not a bonus, they were 'growth shares' issued to him and the other non-executive directors after he became chair and reorganised the board. Three months after supporting Sumner to bid for the PPE gloves deal, in July 2021 Chadlington resigned from SGH. All the non-executive directors were owed fees, he has said, and they resigned at the same time. Chadlington was owed $100,000 director's fees when SGH went into liquidation, almost $350,000 in consultancy fees, and a $180,000 loan. Sumner, who is believed to be in the Philippines, replied to the Guardian's questions by email. He said the DHSC reduced the amount of hand sanitiser bought under the second contract, paying £16.6m rather than £26m, so £40.6m for the two contracts. Sumner said the company had the required commercial and regulatory experience to supply PPE and delivered in accordance with the contracts, and that profit margins were 'circa 15%', although he did not provide evidence for this. He did not answer questions about the payments made out of SG Recruitment, or the closing of its bank accounts in 2021, but said that all money was 'accounted for properly'. In relation to its liquidation, he said the company 'was sold to a third party' almost a year before, 'so it is very disappointing that the new owners had not made a success of the company'. He said he did not accept Chadlington's criticisms of his management. Of the SGH liquidation and questions about the $30m debts creditors were claiming, Sumner said: 'SGH Jersey was not a contracting party and so not relevant.' After Chadlington provided his messages to the Covid inquiry, the Lords standards commissioner has since opened a third inquiry. The possible conduct breaches are the same as previously, but also this time whether he failed to act on his 'personal honour'. That appears to be a question of whether he was not fully transparent with the two previous inquiries. Chadlington's lawyers responded to the Guardian's questions by saying that he did not act in any way improperly 'in connecting Mr Sumner with Lord Feldman'. They said it was up to the government, not Chadlington, to carry out due diligence on SG Recruitment. In relation to questions about his knowledge of SGH's financial position, and the $30m debts, when he introduced Sumner for government PPE contracts, the lawyers said: 'Our client understood that while the company had faced some cashflow issues, these were common to most startup businesses.' He only became concerned about SGH's viability later after he became chair, in particular by the beginning of 2021, they said. In his witness statement to the Covid inquiry, Chadlington said: 'While I was not involved in the awarding of contracts for PPE, I was proud that, by making the necessary introductions, I had played a very small role in helping the country during a national emergency.' A DHSC spokesperson said after Labour was elected to government, it has sought to recover money from PPE contracts that did not deliver. He did not answer questions about SG Recruitment, saying the DHSC cannot currently discuss specific companies. A spokesperson for the CBFFJ said: 'While our loved ones were dying, often without the PPE that could have protected them, the system enabled political insiders to use their connections to secure public contracts. It is devastating to learn that the PPE from one of SG Recruitment's contracts was rejected as unusable. This is exactly the kind of evidence the Covid inquiry should be examining. Bereaved families deserve answers. The public deserves answers. Justice means getting to the bottom of how this happened, who benefited, and who should be held accountable.'


Forbes
08-07-2025
- Business
- Forbes
Who Needs Big AI Models?
Cerebras Systems CEO and Founder Andrew Feldman The AI world continues to evolve rapidly, especially since the introduction of DeepSeek and its followers. Many have concluded that enterprises don't really need the large, expensive AI models touted by OpenAI, Meta, and Google, and are focusing instead on smaller models, such as DeepSeek V2-Lite with 2.4B parameters, or Llama 4 Scout and Maverick with 17B parameters, which can provide decent accuracy at a lower cost. It turns out that this is not the case for coders, or more accurately, the models that can and will replace many coders. Nor does the smaller-is-better mantra apply to reasoning or agentic AI, the next big thing. AI code generators require large models that can handle a wider context window, capable of accommodating approximately 100,000 lines of code. Mixture of expert (MOE) models supporting agentic and reasoning AI is also large. But these massive models are typically quite expensive, costing around $10 to $15 per million output tokens on modern GPUs. Therein lies an opportunity for novel AI architectures to encroach on GPUs' territory. Cerebras Systems Launches Big AI with Qwen3-235B Cerebras Systems (a client of Cambrian-AI Research) has announced support for the large Qwen3-235B, supporting 131K context length (about 200–300 pages of text), four times what was previously available. At the RAISE Summit in Paris, Cerebras touted Alibaba's Qwen3-235B, which uses a highly efficient mixture-of-experts architecture to deliver exceptional compute efficiency. But the real news is that Cerebras can run the model at only $0.60 per million input tokens and per million output tokens—less than one-tenth the cost of comparable closed-source models. While many consider the Cerebras wafer-scale engine expensive, this data turns that perception on its head. Agents are a use case that frequently requires very large models. One question I frequently get is, if Cerebras is so fast, why don't they have more customers? One reason is that they have not supported large context windows and larger models. Those seeking to develop code, for example, do not want to break down the problem into smaller fragments to fit, say, a 32KB context. Now, that barrier to sales has evaporated. 'We're seeing huge demand from developers for frontier models with long context, especially for code generation,' said Cerebras Systems CEO and Founder Andrew Feldman. "Qwen3-235B on Cerebras is our first model that stands toe-to-toe with frontier models like Claude 4 and DeepSeek R1. And with full 131K context, developers can now use Cerebras on production-grade coding applications and get answers back in less than a second instead of waiting for minutes on GPUs.' Cerebras is not just 30 times faster, it is 92% cheaper than GPUs. Cerebras has quadrupled its context length support from 32K to 131K tokens—the maximum supported by Qwen3-235B. This expansion directly impacts the model's ability to reason over large codebases and complex documentation. While 32K context is sufficient for simple code generation use cases, 131K context enables the model to process dozens of files and tens of thousands of lines of code simultaneously, allowing for production-grade application development. Cerebras is 15-100 times more affordable than GPUs when running Qwen3-235B Qwen3-235B excels at tasks requiring deep logical reasoning, advanced mathematics, and code generation, thanks to its ability to switch between "thinking mode" (for high-complexity tasks) and "non-thinking mode" (for efficient, general-purpose dialogue). The 131K context length allows the model to ingest and reason over large codebases (tens of thousands of lines), supporting tasks such as code refactoring, documentation, and bug detection. Cerebras also announced the further expansion of its ecosystem, with support from Amazon AWS, as well as DataRobot, Docker, Cline, and Notion. The addition of AWS is huge; Cerebras has added AWS to its cloud portfolio. Where is this heading? Big AI has constantly been downsized and optimized, with orders of magnitude of performance gains, model sizes, and price reductions. This trend will undoubtedly continue, but will be constantly offset by increases in capabilities, accuracy, intelligence, and entirely new features across modalities. So, if you want last year's AI, you're in great shape, as it continues to get cheaper. But if you want the latest features and functions, you will require the largest models and the longest input context length. It's the Yin and Yang of AI.


Business Wire
28-05-2025
- Business
- Business Wire
Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference
SUNNYVALE, Calif.--(BUSINESS WIRE)--Last week, Nvidia announced that 8 Blackwell GPUs in a DGX B200 could demonstrate 1,000 tokens per second (TPS) per user on Meta's Llama 4 Maverick. Today, the same independent benchmark firm Artificial Analysis measured Cerebras at more than 2,500 TPS/user, more than doubling the performance of Nvidia's flagship solution. 'Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week. Artificial Analysis benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 t/s compared to NVIDIA Blackwell's 1,038 t/s for the same model." - Artificial Analysis Share 'Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week,' said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. 'Artificial Analysis has benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 tokens per second, compared to NVIDIA Blackwell's 1,038 tokens per second for the same model. We've tested dozens of vendors, and Cerebras is the only inference solution that outperforms Blackwell for Meta's flagship model.' With today's results, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest and most powerful in the Llama 4 family. Artificial Analysis tested multiple other vendors, and the results were as follows: SambaNova 794 t/s, Amazon 290 t/s, Groq 549 t/s, Google 125 t/s, and Microsoft Azure 54 t/s. Andrew Feldman, CEO of Cerebras Systems, said, 'The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency. These use cases often involve multi-step chains of thought or large-scale retrieval and planning, with generation speeds as low as 100 tokens per second on GPUs, causing wait times of minutes and making production deployment impractical. Cerebras has led the charge in redefining inference performance across models like Llama, DeepSeek, and Qwen, regularly delivering over 2,500 TPS/user.' With its world record performance, Cerebras is the optimal solution for Llama 4 in any deployment scenario. Not only is Cerebras Inference the first and only API to break the 2,500 TPS/user milestone on this model, but unlike the Nvidia Blackwell used in the Artificial Analysis benchmark, the Cerebras hardware and API are available now. Nvidia used custom software optimizations that are not available to most users. Interestingly, none of the Nvidia's inference providers offer a service at Nvidia's published performance. This suggests that in order to achieve 1000 TPS/user, Nvidia was forced to reduce throughput by going to batch size 1 or 2, leaving the GPUs at less than 1% utilization. Cerebras, on the other hand, achieved this record-breaking performance without any special kernel optimizations, and it will be available to everyone through Meta's API service coming soon. For cutting-edge AI applications such as reasoning, voice, and agentic workflows, speed is paramount. These AI applications gain intelligence by processing more tokens during the inference process. This can also make them slow and force customers to wait. And when customers are forced to wait, they leave and go to competitors who provide answers faster—a finding Google showed with search more than a decade ago. With record-breaking performance, Cerebras hardware and resulting API service is the best choice for developers and enterprise AI users around the world. For more information, please visit
Yahoo
28-05-2025
- Business
- Yahoo
Cerebras Beats NVIDIA Blackwell in Llama 4 Maverick Inference
Cerebras Breaks the 2,500 Tokens Per Second Barrier with Llama 4 Maverick 400B SUNNYVALE, Calif., May 28, 2025--(BUSINESS WIRE)--Last week, Nvidia announced that 8 Blackwell GPUs in a DGX B200 could demonstrate 1,000 tokens per second (TPS) per user on Meta's Llama 4 Maverick. Today, the same independent benchmark firm Artificial Analysis measured Cerebras at more than 2,500 TPS/user, more than doubling the performance of Nvidia's flagship solution. "Cerebras has beaten the Llama 4 Maverick inference speed record set by NVIDIA last week," said Micah Hill-Smith, Co-Founder and CEO of Artificial Analysis. "Artificial Analysis has benchmarked Cerebras' Llama 4 Maverick endpoint at 2,522 tokens per second, compared to NVIDIA Blackwell's 1,038 tokens per second for the same model. We've tested dozens of vendors, and Cerebras is the only inference solution that outperforms Blackwell for Meta's flagship model." With today's results, Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest and most powerful in the Llama 4 family. Artificial Analysis tested multiple other vendors, and the results were as follows: SambaNova 794 t/s, Amazon 290 t/s, Groq 549 t/s, Google 125 t/s, and Microsoft Azure 54 t/s. Andrew Feldman, CEO of Cerebras Systems, said, "The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency. These use cases often involve multi-step chains of thought or large-scale retrieval and planning, with generation speeds as low as 100 tokens per second on GPUs, causing wait times of minutes and making production deployment impractical. Cerebras has led the charge in redefining inference performance across models like Llama, DeepSeek, and Qwen, regularly delivering over 2,500 TPS/user." With its world record performance, Cerebras is the optimal solution for Llama 4 in any deployment scenario. Not only is Cerebras Inference the first and only API to break the 2,500 TPS/user milestone on this model, but unlike the Nvidia Blackwell used in the Artificial Analysis benchmark, the Cerebras hardware and API are available now. Nvidia used custom software optimizations that are not available to most users. Interestingly, none of the Nvidia's inference providers offer a service at Nvidia's published performance. This suggests that in order to achieve 1000 TPS/user, Nvidia was forced to reduce throughput by going to batch size 1 or 2, leaving the GPUs at less than 1% utilization. Cerebras, on the other hand, achieved this record-breaking performance without any special kernel optimizations, and it will be available to everyone through Meta's API service coming soon. For cutting-edge AI applications such as reasoning, voice, and agentic workflows, speed is paramount. These AI applications gain intelligence by processing more tokens during the inference process. This can also make them slow and force customers to wait. And when customers are forced to wait, they leave and go to competitors who provide answers faster—a finding Google showed with search more than a decade ago. With record-breaking performance, Cerebras hardware and resulting API service is the best choice for developers and enterprise AI users around the world. For more information, please visit View source version on Contacts pr@