Latest news with #GeminiFlash


Business Mayor
17-05-2025
- Business
- Business Mayor
Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google's new AlphaEvolve shows what happens when an AI agent graduates from lab demo to production work, and you've got one of the most talented technology companies driving it. Built by Google's DeepMind, the system autonomously rewrites critical code and already pays for itself inside Google. It shattered a 56-year-old record in matrix multiplication (the core of many machine learning workloads) and clawed back 0.7% of compute capacity across the company's global data centers. Those headline feats matter, but the deeper lesson for enterprise tech leaders is how AlphaEvolve pulls them off. Its architecture – controller, fast-draft models, deep-thinking models, automated evaluators and versioned memory – illustrates the kind of production-grade plumbing that makes autonomous agents safe to deploy at scale. Google's AI technology is arguably second to none. So the trick is figuring out how to learn from it, or even using it directly. Google says an Early Access Program is coming for academic partners and that 'broader availability' is being explored, but details are thin. Until then, AlphaEvolve is a best-practice template: If you want agents that touch high-value workloads, you'll need comparable orchestration, testing and guardrails. Consider just the data center win. Google won't put a price tag on the reclaimed 0.7%, but its annual capex runs tens of billions of dollars. Even a rough estimate puts the savings in the hundreds of millions annually— enough, as independent developer Sam Witteveen noted on our recent podcast, to pay for training one of the flagship Gemini models, estimated to cost upwards of $191 million for a version like Gemini Ultra. VentureBeat was the first to report about the AlphaEvolve news earlier this week. Now we'll go deeper: how the system works, where the engineering bar really sits and the concrete steps enterprises can take to build (or buy) something comparable. AlphaEvolve runs on what is best described as an agent operating system – a distributed, asynchronous pipeline built for continuous improvement at scale. Its core pieces are a controller, a pair of large language models (Gemini Flash for breadth; Gemini Pro for depth), a versioned program-memory database and a fleet of evaluator workers, all tuned for high throughput rather than just low latency. A high-level overview of the AlphaEvolve agent structure. Source: AlphaEvolve paper. This architecture isn't conceptually new, but the execution is. 'It's just an unbelievably good execution,' Witteveen says. The AlphaEvolve paper describes the orchestrator as an 'evolutionary algorithm that gradually develops programs that improve the score on the automated evaluation metrics' (p. 3); in short, an 'autonomous pipeline of LLMs whose task is to improve an algorithm by making direct changes to the code' (p. 1). Takeaway for enterprises: If your agent plans include unsupervised runs on high-value tasks, plan for similar infrastructure: job queues, a versioned memory store, service-mesh tracing and secure sandboxing for any code the agent produces. A key element of AlphaEvolve is its rigorous evaluation framework. Every iteration proposed by the pair of LLMs is accepted or rejected based on a user-supplied 'evaluate' function that returns machine-gradable metrics. This evaluation system begins with ultrafast unit-test checks on each proposed code change – simple, automatic tests (similar to the unit tests developers already write) that verify the snippet still compiles and produces the right answers on a handful of micro-inputs – before passing the survivors on to heavier benchmarks and LLM-generated reviews. This runs in parallel, so the search stays fast and safe. In short: Let the models suggest fixes, then verify each one against tests you trust. AlphaEvolve also supports multi-objective optimization (optimizing latency and accuracy simultaneously), evolving programs that hit several metrics at once. Counter-intuitively, balancing multiple goals can improve a single target metric by encouraging more diverse solutions. Takeaway for enterprises: Production agents need deterministic scorekeepers. Whether that's unit tests, full simulators, or canary traffic analysis. Automated evaluators are both your safety net and your growth engine. Before you launch an agentic project, ask: 'Do we have a metric the agent can score itself against?' AlphaEvolve tackles every coding problem with a two-model rhythm. First, Gemini Flash fires off quick drafts, giving the system a broad set of ideas to explore. Then Gemini Pro studies those drafts in more depth and returns a smaller set of stronger candidates. Feeding both models is a lightweight 'prompt builder,' a helper script that assembles the question each model sees. It blends three kinds of context: earlier code attempts saved in a project database, any guardrails or rules the engineering team has written and relevant external material such as research papers or developer notes. With that richer backdrop, Gemini Flash can roam widely while Gemini Pro zeroes in on quality. Unlike many agent demos that tweak one function at a time, AlphaEvolve edits entire repositories. It describes each change as a standard diff block – the same patch format engineers push to GitHub – so it can touch dozens of files without losing track. Afterward, automated tests decide whether the patch sticks. Over repeated cycles, the agent's memory of success and failure grows, so it proposes better patches and wastes less compute on dead ends. Takeaway for enterprises: Let cheaper, faster models handle brainstorming, then call on a more capable model to refine the best ideas. Preserve every trial in a searchable history, because that memory speeds up later work and can be reused across teams. Accordingly, vendors are rushing to provide developers with new tooling around things like memory. Products such as OpenMemory MCP, which provides a portable memory store, and the new long- and short-term memory APIs in LlamaIndex are making this kind of persistent context almost as easy to plug in as logging. OpenAI's Codex-1 software-engineering agent, also released today, underscores the same pattern. It fires off parallel tasks inside a secure sandbox, runs unit tests and returns pull-request drafts—effectively a code-specific echo of AlphaEvolve's broader search-and-evaluate loop. AlphaEvolve's tangible wins – reclaiming 0.7% of data center capacity, cutting Gemini training kernel runtime 23%, speeding FlashAttention 32%, and simplifying TPU design – share one trait: they target domains with airtight metrics. For data center scheduling, AlphaEvolve evolved a heuristic that was evaluated using a simulator of Google's data centers based on historical workloads. For kernel optimization, the objective was to minimize actual runtime on TPU accelerators across a dataset of realistic kernel input shapes. Takeaway for enterprises: When starting your agentic AI journey, look first at workflows where 'better' is a quantifiable number your system can compute – be it latency, cost, error rate or throughput. This focus allows automated search and de-risks deployment because the agent's output (often human-readable code, as in AlphaEvolve's case) can be integrated into existing review and validation pipelines. This clarity allows the agent to self-improve and demonstrate unambiguous value. While AlphaEvolve's achievements are inspiring, Google's paper is also clear about its scope and requirements. The primary limitation is the need for an automated evaluator; problems requiring manual experimentation or 'wet-lab' feedback are currently out of scope for this specific approach. The system can consume significant compute – 'on the order of 100 compute-hours to evaluate any new solution' (AlphaEvolve paper, page 8), necessitating parallelization and careful capacity planning. Before allocating significant budget to complex agentic systems, technical leaders must ask critical questions: Machine-gradable problem? Do we have a clear, automatable metric against which the agent can score its own performance? Do we have a clear, automatable metric against which the agent can score its own performance? Compute capacity? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Codebase & memory readiness? Is your codebase structured for iterative, possibly diff-based, modifications? And can you implement the instrumented memory systems vital for an agent to learn from its evolutionary history? Read More When to ignore — and believe — the AI hype cycle Takeaway for enterprises: The increasing focus on robust agent identity and access management, as seen with platforms like Frontegg, Auth0 and others, also points to the maturing infrastructure required to deploy agents that interact securely with multiple enterprise systems. AlphaEvolve's message for enterprise teams is manifold. First, your operating system around agents is now far more important than model intelligence. Google's blueprint shows three pillars that can't be skipped: Deterministic evaluators that give the agent an unambiguous score every time it makes a change. Long-running orchestration that can juggle fast 'draft' models like Gemini Flash with slower, more rigorous models – whether that's Google's stack or a framework such as LangChain's LangGraph. Persistent memory so each iteration builds on the last instead of relearning from scratch. Enterprises that already have logging, test harnesses and versioned code repositories are closer than they think. The next step is to wire those assets into a self-serve evaluation loop so multiple agent-generated solutions can compete, and only the highest-scoring patch ships. As Cisco's Anurag Dhingra, VP and GM of Enterprise Connectivity and Collaboration, told VentureBeat in an interview this week: 'It's happening, it is very, very real,' he said of enterprises using AI agents in manufacturing, warehouses, customer contact centers. 'It is not something in the future. It is happening there today.' He warned that as these agents become more pervasive, doing 'human-like work,' the strain on existing systems will be immense: 'The network traffic is going to go through the roof,' Dhingra said. Your network, budget and competitive edge will likely feel that strain before the hype cycle settles. Start proving out a contained, metric-driven use case this quarter – then scale what works. Watch the video podcast I did with developer Sam Witteveen, where we go deep on production-grade agents, and how AlphaEvolve is showing the way:


Time of India
15-05-2025
- Business
- Time of India
Google DeepMind's new AI coding tool can solve complex math problems, design algorithms
Google's artificial intelligence (AI) research lab DeepMind has unveiled an advanced agent, AlphaEvolve , which can target fundamental and complex mathematics and computing problems. It has the versatility of large language models (LLMs), which can summarise documents, generate code, and generate new ideas. It also goes a step ahead by verifying answers through automated evaluators . One of the major threats facing the nascent AI world is hallucinations by chatbots. What AlphaEvolve does is that it uses LLMs to generate answers to prompts, and automatically evaluates and scores these answers for accuracy. Researchers have used this technique before, but according to DeepMind, the 'state-of-the-art' Gemini Flash and Gemini Pro models make AlphaEvolve more capable. "Together, these models propose computer programs that implement algorithmic solutions as code," DeepMind said in a blog post. Live Events Google also deployed AlphaEvolve on its own infrastructure to test it across practical problems. As per the blog post, AlphaEvolve enhanced the efficiency of Google's data centres, chip design and AI training processes, including training the large language models underlying AlphaEvolve itself. Discover the stories of your interest Blockchain 5 Stories Cyber-safety 7 Stories Fintech 9 Stories E-comm 9 Stories ML 8 Stories Edtech 6 Stories "By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini's architecture by 23%, leading to a 1% reduction in Gemini's training time," the lab said. To test AlphaEvolve's breadth, DeepMind applied the system to over 50 open problems in mathematical analysis, geometry, combinatorics and number theory. In roughly 75% of cases, AlphaEvolve "rediscovered" best-known solutions, and in 20% of the cases, it improved upon these solutions.
Yahoo
18-03-2025
- Yahoo
Removing Watermarks From Images With Gemini Is Now Way Too Easy
As if AI wasn't enough hot water for pirating content for their training, Google's Gemini might get into more trouble before too long. Users of the latest Gemini Flash AI update have found that it's particularly adept at filling in the gaps in pictures. That makes it stellar for removing watermarks from images, as users can simply cut them out and have Gemini fill in the gaps. Google has been pushing its Gemini large language model AI for some time now, attempting to compete with the likes of Microsoft's Copilot and OpenAI's ChatGPT. It's had some success, but like the others is just flying ahead with ongoing updates to keep up in a race that everyone now appears to be running. It released the new Gemini Flash 2.0 model last week, claiming massive performance improvements—as much as twice as fast as the 1.5 Flash model in the comparative benchmark. But despite its speed, the latest update has proved controversial, as it's particularly good at one thing that's making it perfect for nefarious antics. Other AI tools are really good at filling in the blanks in images, too, but Gemini Flash is particularly good at it, is very fast at it, and the tool is completely free to use. Where other AI might provide more of a paywall lock on such services, Google's Gemini Flash is already making waves for its mix of accessibility and capabilities. The image generation feature of Gemini Flash is labelled as "Experimental," as The Verge points out, so it may not be available for long (especially in its current form). But for now, it appears to be very available and increasingly popular. This particular function is already gathering some traction on Reddit and Twitter, where users are highlighting just how good Gemini Flash is at this particularly tricky task. Other AI models can do this too, but you have to be a bit smarter about how you ask about it. As Verge highlights, Anthropic's latest Claude model, and OpenAI's GPT 4o will refuse to alter watermarked images. We can confirm that when you add copyright-protected images to Microsoft Office applications, its Copilot and Design tools will refuse to modify them directly. Google hasn't yet commented on this potential problem, but copyright holders are already voicing their concerns. It seems likely that Gemini Flash will be updated again in the future to make using it for watermark removal at least a little more difficult.