logo
Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

Business Mayor17-05-2025
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Google's new AlphaEvolve shows what happens when an AI agent graduates from lab demo to production work, and you've got one of the most talented technology companies driving it.
Built by Google's DeepMind, the system autonomously rewrites critical code and already pays for itself inside Google. It shattered a 56-year-old record in matrix multiplication (the core of many machine learning workloads) and clawed back 0.7% of compute capacity across the company's global data centers.
Those headline feats matter, but the deeper lesson for enterprise tech leaders is how AlphaEvolve pulls them off. Its architecture – controller, fast-draft models, deep-thinking models, automated evaluators and versioned memory – illustrates the kind of production-grade plumbing that makes autonomous agents safe to deploy at scale.
Google's AI technology is arguably second to none. So the trick is figuring out how to learn from it, or even using it directly. Google says an Early Access Program is coming for academic partners and that 'broader availability' is being explored, but details are thin. Until then, AlphaEvolve is a best-practice template: If you want agents that touch high-value workloads, you'll need comparable orchestration, testing and guardrails.
Consider just the data center win. Google won't put a price tag on the reclaimed 0.7%, but its annual capex runs tens of billions of dollars. Even a rough estimate puts the savings in the hundreds of millions annually— enough, as independent developer Sam Witteveen noted on our recent podcast, to pay for training one of the flagship Gemini models, estimated to cost upwards of $191 million for a version like Gemini Ultra.
VentureBeat was the first to report about the AlphaEvolve news earlier this week. Now we'll go deeper: how the system works, where the engineering bar really sits and the concrete steps enterprises can take to build (or buy) something comparable.
AlphaEvolve runs on what is best described as an agent operating system – a distributed, asynchronous pipeline built for continuous improvement at scale. Its core pieces are a controller, a pair of large language models (Gemini Flash for breadth; Gemini Pro for depth), a versioned program-memory database and a fleet of evaluator workers, all tuned for high throughput rather than just low latency.
A high-level overview of the AlphaEvolve agent structure. Source: AlphaEvolve paper.
This architecture isn't conceptually new, but the execution is. 'It's just an unbelievably good execution,' Witteveen says.
The AlphaEvolve paper describes the orchestrator as an 'evolutionary algorithm that gradually develops programs that improve the score on the automated evaluation metrics' (p. 3); in short, an 'autonomous pipeline of LLMs whose task is to improve an algorithm by making direct changes to the code' (p. 1).
Takeaway for enterprises: If your agent plans include unsupervised runs on high-value tasks, plan for similar infrastructure: job queues, a versioned memory store, service-mesh tracing and secure sandboxing for any code the agent produces.
A key element of AlphaEvolve is its rigorous evaluation framework. Every iteration proposed by the pair of LLMs is accepted or rejected based on a user-supplied 'evaluate' function that returns machine-gradable metrics. This evaluation system begins with ultrafast unit-test checks on each proposed code change – simple, automatic tests (similar to the unit tests developers already write) that verify the snippet still compiles and produces the right answers on a handful of micro-inputs – before passing the survivors on to heavier benchmarks and LLM-generated reviews. This runs in parallel, so the search stays fast and safe.
In short: Let the models suggest fixes, then verify each one against tests you trust. AlphaEvolve also supports multi-objective optimization (optimizing latency and accuracy simultaneously), evolving programs that hit several metrics at once. Counter-intuitively, balancing multiple goals can improve a single target metric by encouraging more diverse solutions.
Takeaway for enterprises: Production agents need deterministic scorekeepers. Whether that's unit tests, full simulators, or canary traffic analysis. Automated evaluators are both your safety net and your growth engine. Before you launch an agentic project, ask: 'Do we have a metric the agent can score itself against?'
AlphaEvolve tackles every coding problem with a two-model rhythm. First, Gemini Flash fires off quick drafts, giving the system a broad set of ideas to explore. Then Gemini Pro studies those drafts in more depth and returns a smaller set of stronger candidates. Feeding both models is a lightweight 'prompt builder,' a helper script that assembles the question each model sees. It blends three kinds of context: earlier code attempts saved in a project database, any guardrails or rules the engineering team has written and relevant external material such as research papers or developer notes. With that richer backdrop, Gemini Flash can roam widely while Gemini Pro zeroes in on quality.
Unlike many agent demos that tweak one function at a time, AlphaEvolve edits entire repositories. It describes each change as a standard diff block – the same patch format engineers push to GitHub – so it can touch dozens of files without losing track. Afterward, automated tests decide whether the patch sticks. Over repeated cycles, the agent's memory of success and failure grows, so it proposes better patches and wastes less compute on dead ends.
Takeaway for enterprises: Let cheaper, faster models handle brainstorming, then call on a more capable model to refine the best ideas. Preserve every trial in a searchable history, because that memory speeds up later work and can be reused across teams. Accordingly, vendors are rushing to provide developers with new tooling around things like memory. Products such as OpenMemory MCP, which provides a portable memory store, and the new long- and short-term memory APIs in LlamaIndex are making this kind of persistent context almost as easy to plug in as logging.
OpenAI's Codex-1 software-engineering agent, also released today, underscores the same pattern. It fires off parallel tasks inside a secure sandbox, runs unit tests and returns pull-request drafts—effectively a code-specific echo of AlphaEvolve's broader search-and-evaluate loop.
AlphaEvolve's tangible wins – reclaiming 0.7% of data center capacity, cutting Gemini training kernel runtime 23%, speeding FlashAttention 32%, and simplifying TPU design – share one trait: they target domains with airtight metrics.
For data center scheduling, AlphaEvolve evolved a heuristic that was evaluated using a simulator of Google's data centers based on historical workloads. For kernel optimization, the objective was to minimize actual runtime on TPU accelerators across a dataset of realistic kernel input shapes.
Takeaway for enterprises: When starting your agentic AI journey, look first at workflows where 'better' is a quantifiable number your system can compute – be it latency, cost, error rate or throughput. This focus allows automated search and de-risks deployment because the agent's output (often human-readable code, as in AlphaEvolve's case) can be integrated into existing review and validation pipelines.
This clarity allows the agent to self-improve and demonstrate unambiguous value.
While AlphaEvolve's achievements are inspiring, Google's paper is also clear about its scope and requirements.
The primary limitation is the need for an automated evaluator; problems requiring manual experimentation or 'wet-lab' feedback are currently out of scope for this specific approach. The system can consume significant compute – 'on the order of 100 compute-hours to evaluate any new solution' (AlphaEvolve paper, page 8), necessitating parallelization and careful capacity planning.
Before allocating significant budget to complex agentic systems, technical leaders must ask critical questions:
Machine-gradable problem? Do we have a clear, automatable metric against which the agent can score its own performance?
Do we have a clear, automatable metric against which the agent can score its own performance? Compute capacity? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase?
Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Codebase & memory readiness? Is your codebase structured for iterative, possibly diff-based, modifications? And can you implement the instrumented memory systems vital for an agent to learn from its evolutionary history?
Read More When to ignore — and believe — the AI hype cycle
Takeaway for enterprises: The increasing focus on robust agent identity and access management, as seen with platforms like Frontegg, Auth0 and others, also points to the maturing infrastructure required to deploy agents that interact securely with multiple enterprise systems.
AlphaEvolve's message for enterprise teams is manifold. First, your operating system around agents is now far more important than model intelligence. Google's blueprint shows three pillars that can't be skipped:
Deterministic evaluators that give the agent an unambiguous score every time it makes a change.
Long-running orchestration that can juggle fast 'draft' models like Gemini Flash with slower, more rigorous models – whether that's Google's stack or a framework such as LangChain's LangGraph.
Persistent memory so each iteration builds on the last instead of relearning from scratch.
Enterprises that already have logging, test harnesses and versioned code repositories are closer than they think. The next step is to wire those assets into a self-serve evaluation loop so multiple agent-generated solutions can compete, and only the highest-scoring patch ships.
As Cisco's Anurag Dhingra, VP and GM of Enterprise Connectivity and Collaboration, told VentureBeat in an interview this week: 'It's happening, it is very, very real,' he said of enterprises using AI agents in manufacturing, warehouses, customer contact centers. 'It is not something in the future. It is happening there today.' He warned that as these agents become more pervasive, doing 'human-like work,' the strain on existing systems will be immense: 'The network traffic is going to go through the roof,' Dhingra said. Your network, budget and competitive edge will likely feel that strain before the hype cycle settles. Start proving out a contained, metric-driven use case this quarter – then scale what works.
Watch the video podcast I did with developer Sam Witteveen, where we go deep on production-grade agents, and how AlphaEvolve is showing the way:
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

AI in IR: Opportunities, Risks, and What You Need to Know
AI in IR: Opportunities, Risks, and What You Need to Know

Business Wire

timean hour ago

  • Business Wire

AI in IR: Opportunities, Risks, and What You Need to Know

If there's one aspect of artificial intelligence that I can relate to as a communications strategist and former journalist, it's the fact that I've felt like a 'large language model' for most of my career. I don't mean model in terms of my physical attributes. I mean model in a way that describes how most generative AI tools process information and organize responses based on prompts. That's effectively what I've been doing in my career for nearly three decades! The good news is that platforms like OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude are extremely helpful when processing mass quantities of complicated information. Using these platforms to understand a concept or interpret text is like using a calculator to work through a math problem. And yet, many of us really don't know how these word crunchers work. This applies to AI tools used for investor relations, public relations, or anything else where an AI model could be prompted with sensitive information, which is then consumed by the public. Think about how many people working for public companies may inadvertently prompt ChatGPT with material nonpublic information (MNPI), which then informs a trader to ask the platform whether they should buy or sell a stock. AI Concerns Among IR Professionals Earlier this year, I worked with the University of Florida on a survey that found that 82 percent of IR professionals had concerns about disclosure issues surrounding AI use and MNPI. At the same time, 91 percent of survey respondents were worried about accuracy or bias, and 74 percent expressed data privacy concerns. These factors are enough for compliance teams to ban AI use altogether. But fear-mongering is shortsighted. There are plenty of ways to use AI safely, and understanding the basics of the technology, as well as its shortcomings, will make for more responsible and effective AI use in the future. Why You Should Know Where AI Gets Its Data One of the first questions someone should ask themselves when using a new AI platform is where the information is sourced. The acronym 'GPT' stands for generative pre-trained transformer, and that is a fancy way of saying that the technology can 'generate' information or words based on 'training' and data it received, which is then 'transformed' into sentences. This also means that every time someone asks one of these platforms a question or prompt, they are pumping information into a GPT. That makes these platforms even smarter when analyzing complex business models. For example, many IR folks get bogged down summarizing sell-side analysts' models and earnings forecasts from research notes. Simply upload those models into ChatGPT, and the platform does a great job understanding the contents and providing a digestible summary. Interested in analyzing the sentiment of a two-hour conference call script? How about uploading the script (post call to avoid MNPI) to Gemini and requesting a summary on what drew the most positive sentiment among investors? The Importance of AI Training and Education in IR But here's the rub: Only 25.4 percent of companies provided AI-related training in the past two years, according to the U.F. survey. This suggests a disconnect between advancing AI technology and people's understanding of how to use it. That means the onus is on us to figure it out. So, where to start? Many AI tools, including ChatGPT, have free versions that can help people summarize, plan, edit, and revise items. Google's NotebookLM, is an AI platform that allows you to create a GPT, so you know where the AI is sourcing the information from. NotebookLM can also create podcasts based on the information generated by its LLM. This could be helpful if a chief executive officer wants to take a run on a treadmill and listen to a summary of analysts' notes instead of having to read them in a tedious email. Here are some other quick-hit ideas: Transcribing notes. If you're like me, you still prefer using a pen and pad when taking notes. You can take a picture of those notes, upload them to ChatGPT, and have it transcribed into text. Planning investor days. If you can prompt an AI with the essentials – the who, what, when, where, why, and how of the event – it can provide a thorough outline that makes you look smart and organized when sending it around to the team. Analyzing proxy battles. Proxy fights are always challenging, especially when parsing the needs and wants of key stakeholders, including activists, media, management teams, and board members. Feeding an AI with publicly available information (to, again, avoid disclosure issues) can help IR and comms professionals formulate a strategy. Crafting smarter AI prompts. Writing effective prompts requires some finesse. The beauty of AI is that it can help you refine your prompts, leading to better information gathering. Try asking ChatGPT the following question: 'If Warren Buffet is interested in investing in a company, what would be an effective AI prompt to understand its return on investment?' There are many other use cases that can help eliminate mundane tasks, allowing for humans to focus more on strategy. But in order to use AI effectively, it's important to know the reason you're using it. Perhaps, it's demonstrating to management that being an early adopter of this technology is important to help a company differentiate itself. Building a Responsible AI Policy for Your Organization Before implementing any AI initiatives, it's best to formulate an AI policy that organizations can adopt for internal and external use. Most companies are lacking these policies, which are critical for establishing the basic ground rules for AI use. I helped co-author the National Investor Relations Institute's AI policy, which recommends the following: The IR professional should be an educated voice within the company on the use of AI in IR, and this necessitates becoming knowledgeable about AI. The IR professional should understand the pace at which their company is adopting AI capabilities and be prepared to execute their IR-AI strategy based on management's expectations. Avoid Regulation Fair Disclosure (Reg FD) violations. The basic tenet is to never put MNPI into any AI tool unless the tool has the requisite security, as defined or required by the company's security experts, and has been explicitly approved for this particular use by company management. AI Will Not Replace You. But Someone Using AI Might. There is this prevailing fear that somehow AI is going to take over the world. But the technology is not likely going to replace your job. It's smart users of the technology who will likely replace your job. AI is transforming how IR professionals work, but using it responsibly starts with understanding how it works. From summarizing complex reports to enhancing stakeholder communication, AI can be a powerful tool when used thoughtfully. Start by learning the basics, implementing clear policies, and exploring trusted tools to unlock its full potential.

Gemini will remember more (or less) of what you say
Gemini will remember more (or less) of what you say

Engadget

timean hour ago

  • Engadget

Gemini will remember more (or less) of what you say

Google is adding a temporary chat feature to Gemini. The equivalent of a browser's incognito mode, it lets you have one-off AI chats. They won't appear in your history, influence future chats or be used for training. The temporary chats will be saved for up to 72 hours. Google says this is to give you time to revisit the chat or provide feedback. The feature begins rolling out today and will continue to do so over the coming weeks. It arrives alongside a new setting that does, well, pretty much the opposite. The Gemini app can now learn from your conversations and remember details and preferences. It may then reference them in future chats. (For example, it might recall a hobby you once mentioned when you later ask it for party theme ideas.) Google added the past chats feature to Gemini Advanced earlier this year. ChatGPT and Claude each have a similar memory option. The memory setting is on by default, so you'll want to tweak your privacy settings as soon as it arrives if you don't want to use it. In the Gemini app, head to Settings > Personal context > Your past chats with Gemini to change it. Screenshots in the Gemini app (phone and tablet), showing personal context settings. (Google) Speaking of settings, Google is changing the name of its data-retention toggle. What was once "Gemini Apps Activity" is now labeled as "Keep Activity." Despite the semantic change, your previous setting will stick, so you shouldn't need to change this one. Personalized conversations will first launch with Gemini 2.5 Pro in "select countries." It will make its way to 2.5 Flash and more regions in the weeks ahead.

I'm going to be extremely upset if the Pixel 10 doesn't launch with this accessory — here's why
I'm going to be extremely upset if the Pixel 10 doesn't launch with this accessory — here's why

Tom's Guide

timean hour ago

  • Tom's Guide

I'm going to be extremely upset if the Pixel 10 doesn't launch with this accessory — here's why

As someone who has been complaining about the lack of Qi2 on Android for what feels like years, the Pixel 10 rumors have been absolutely thrilling. Not only is Qi2 reportedly coming to the Pixel 10 series, word is that it won't be the stripped-down magnet-free version employed by the Galaxy S25 series. The best part of the rumor might just be the fact that Google could also launch its own range of "PixelSnap" magnetic accessories for the Pixel 10. Accessories that could include the one thing I desperately want — a magnetic charging stand. Considering that the Pixel Stand 2 changed my opinion on wireless charging, the prospect of having a 3rd generation—complete with faster charging and magnetism—is one that I am very excited about. The Pixel Stand 2 is a great wireless charger. Not only does it offer wireless charging speeds up to 23W for compatible phones, and cooling fans in the back to avoid heat build-up, it can also activate a special "Stand Mode." This effectively turns your Pixel into a miniature Nest Hub as soon as you place your phone on the charger. This gives users the ability to control Google Home devices, control media playback without unlocking the phone and see an ever-changing slideshow of Google Photos albums. The problem with Pixel Stand 2 is that it has been somewhat stagnant for the past four years since its release. Despite running with a 30W charger, wireless charging speeds have been capped at 23W, and the Stand itself was clearly only designed for Pixel phones in mind. Other phones will wirelessly charge on the Pixel Stand 2, but others have trouble. The iPhone 15 Pro Max I keep on my desk is a great example, since the wireless charging coils are so out of alignment that the two will not work together without help. I have to raise the phone, usually with a pen, to actually get power into the phone — and it's often too fiddly and unstable to bother with. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. Obviously, there's a limit to what you can do with a charging stand that turns your phone into a mini smart home hub. It's not like a pad where you can position your phone more freely, and it has been built specifically for Pixel phones. Other manufacturers' devices will obviously be a much lower priority. Because not everyone out there tests phones for a living, and it's rare to find someone with more handsets than fingers. Still, considering Pixel Stand 2 was discontinued last year, and is much harder to buy, it's about time Google released something to replace it. Even if it's only to offer faster wireless charging for Pixel owners. Should we get a Qi2 Pixel Stand, whether it's called PixelSnap Stand or Pixel Stand 3, it could make a big change. Not just in terms of offering a better and faster wireless charging experience for Pixel phones, but also in how it works with other devices. The whole point of Qi2 is to improve the standard of basic wireless charging, without needing a manufacturer-specific system to reach those higher speeds. With the release of Qi2.2 earlier this year, that speed has increased to 25W. Considering that's already ahead of the current 23W cap on Pixel phones, that leaves room for Google to offer significantly faster speeds on its own devices. Because there needs to be a good selling point to keep people from simply buying third-party chargers. The magnets also ensure you get perfect coil alignment. That improves efficiency, reduces heat build-up, and makes for an overall better and faster charging experience. Plus, if Qi2 magnets are as powerful as MagSafe, it will keep your phone locked firmly in place until you need to. And since those magnets have that grip, there's a lot more freedom in how you position the phone. This is beneficial for non-Google phones and any future Pixel designs that may deviate from the current norm. Like some kind of unreleased foldable phone, or even larger if the Nest hub features are retained, you could incorporate it into a future version of the Pixel tablet. Who needs a Nest Hub now? A new Pixel Stand wireless charger is not going to change the world. But should Google end up adding Qi2 to the Pixel 10 lineup, it would be crazy not to release a third-generation model to take advantage of all those changes. Even if there aren't any other PixelSnap accessories, as rumored, the least Google can do is up the ante on wireless charging. My only question is, how much will such a charger inevitably cost? The Pixel Stand 2 was $79 at full price, and that was always something that put me off buying one — especially given my previous misgivings about wireless charging. Here's hoping it's not too ridiculous a price, especially given rumors that the Pixel 10 Pro XL may up its starting price by scrapping the 128GB storage option. Still, even if it is comparable to the second-generation charger, if I bought a Pixel 10, I'd probably be tempted to pick up a PixelSnap charger alongside it.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store