Latest news with #VentureBeat

Microsoft just launched an AI that discovered a new chemical in 200 hours instead of years

Business Mayor

19-05-2025

Business
Business Mayor

Microsoft just launched an AI that discovered a new chemical in 200 hours instead of years

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft launched a new enterprise platform that harnesses artificial intelligence to dramatically accelerate scientific research and development, potentially compressing years of laboratory work into weeks or even days. The platform, called Microsoft Discovery, leverages specialized AI agents and high-performance computing to help scientists and engineers tackle complex research challenges without requiring them to write code, the company announced Monday at its annual Build developer conference. 'What we're doing is really taking a look at how we can apply advancements in agentic AI and compute work, and then on to quantum computing, and apply it in the really important space, which is science,' said Jason Zander, Corporate Vice President of Strategic Missions and Technologies at Microsoft, in an exclusive interview with VentureBeat. The system has already demonstrated its potential in Microsoft's own research, where it helped discover a novel coolant for immersion cooling of data centers in approximately 200 hours — a process that traditionally would have taken months or years. 'In 200 hours with this framework, we were able to go through and screen 367,000 potential candidates that we came up with,' Zander explained. 'We actually took it to a partner, and they actually synthesized it.' Microsoft Discovery represents a significant step toward democratizing advanced scientific tools, allowing researchers to interact with supercomputers and complex simulations using natural language rather than requiring specialized programming skills. 'It's about empowering scientists to transform the entire discovery process with agentic AI,' Zander emphasized. 'My PhD is in biology. I'm not a computer scientist, but if you can unlock that power of a supercomputer just by allowing me to prompt it, that's very powerful.' The platform addresses a key challenge in scientific research: the disconnect between domain expertise and computational skills. Traditionally, scientists would need to learn programming to leverage advanced computing tools, creating a bottleneck in the research process. This democratization could prove particularly valuable for smaller research institutions that lack the resources to hire computational specialists to augment their scientific teams. By allowing domain experts to directly query complex simulations and run experiments through natural language, Microsoft is effectively lowering the barrier to entry for cutting-edge research techniques. 'As a scientist, I'm a biologist. I don't know how to write computer code. I don't want to spend all my time going into an editor and writing scripts and stuff to ask a supercomputer to do something,' Zander said. 'I just wanted, like, this is what I want in plain English or plain language, and go do it.' Microsoft Discovery operates through what Zander described as a team of AI 'postdocs' — specialized agents that can perform different aspects of the scientific process, from literature review to computational simulations. 'These postdoc agents do that work,' Zander explained. 'It's like having a team of folks that just got their PhD. They're like residents in medicine — you're in the hospital, but you're still finishing.' The platform combines two key components: foundational models that handle planning and specialized models trained for particular scientific domains like physics, chemistry, and biology. What makes this approach unique is how it blends general AI capabilities with deeply specialized scientific knowledge. 'The core process, you'll find two parts of this,' Zander said. 'One is we're using foundational models for doing the planning. The other piece is, on the AI side, a set of models that are designed specifically for particular domains of science, that includes physics, chemistry, biology.' According to a company statement, Microsoft Discovery is built on a 'graph-based knowledge engine' that constructs nuanced relationships between proprietary data and external scientific research. This allows it to understand conflicting theories and diverse experimental results across disciplines, while maintaining transparency by tracking sources and reasoning processes. At the center of the user experience is a Copilot interface that orchestrates these specialized agents based on researcher prompts, identifying which agents to leverage and setting up end-to-end workflows. This interface essentially acts as the central hub where human scientists can guide their virtual research team. To demonstrate the platform's capabilities, Microsoft used Microsoft Discovery to address a pressing challenge in data center technology: finding alternatives to coolants containing PFAS, so-called 'forever chemicals' that are increasingly facing regulatory restrictions. Current data center cooling methods often rely on harmful chemicals that are becoming untenable as global regulations push to ban these substances. Microsoft researchers used the platform to screen hundreds of thousands of potential alternatives. 'We did prototypes on this. Actually, when I owned Azure, I did a prototype eight years ago, and it works super well, actually,' Zander said. 'It's actually like 60 to 90% more efficient than just air cooling. The big problem is that coolant material that's on market has PFAS in it.' After identifying promising candidates, Microsoft synthesized the coolant and demonstrated it cooling a GPU running a video game. While this specific application remains experimental, it illustrates how Microsoft Discovery can compress development timelines for companies facing regulatory challenges. The implications extend far beyond Microsoft's own data centers. Any industry facing similar regulatory pressure to replace established chemicals or materials could potentially use this approach to accelerate their R&D cycles dramatically. What once would have been multi-year development processes might now be completed in a matter of months. Daniel Pope, founder of Submer, a company focused on sustainable data centers, was quoted in the press release saying: 'The speed and depth of molecular screening achieved by Microsoft Discovery would've been impossible with traditional methods. What once took years of lab work and trial-and-error, Microsoft Discovery can accomplish in just weeks, and with greater confidence.' Microsoft is building an ecosystem of partners across diverse industries to implement the platform, indicating its broad applicability beyond the company's internal research needs. Pharmaceutical giant GSK is exploring the platform for its potential to transform medicinal chemistry. The company stated an intent to partner with Microsoft to advance 'GSK's generative platforms for parallel prediction and testing, creating new medicines with greater speed and precision.' In the consumer space, Estée Lauder plans to harness Microsoft Discovery to accelerate product development in skincare, makeup, and fragrance. 'The Microsoft Discovery platform will help us to unleash the power of our data to drive fast, agile, breakthrough innovation and high-quality, personalized products that will delight our consumers,' said Kosmas Kretsos, PhD, MBA, Vice President of R&D and Innovation Technology at Estée Lauder Companies. Microsoft is also expanding its partnership with Nvidia to integrate Nvidia's ALCHEMI and BioNeMo NIM microservices with Microsoft Discovery, enabling faster breakthroughs in materials and life sciences. This partnership will allow researchers to leverage state-of-the-art inference capabilities for candidate identification, property mapping, and synthetic data generation. 'AI is dramatically accelerating the pace of scientific discovery,' said Dion Harris, senior director of accelerated data center solutions at Nvidia. 'By integrating Nvidia ALCHEMI and BioNeMo NIM microservices into Azure Discovery, we're giving scientists the ability to move from data to discovery with unprecedented speed, scale, and efficiency.' Read More Hewlett Packard Enterprise to acquire Juniper Networks - Verdict In the semiconductor space, Microsoft plans to integrate Synopsys' industry solutions to accelerate chip design and development. Sassine Ghazi, President and CEO of Synopsys, described semiconductor engineering as 'among the most complex, consequential and high-stakes scientific endeavors of our time,' making it 'an extremely compelling use case for artificial intelligence.' System integrators Accenture and Capgemini will help customers implement and scale Microsoft Discovery deployments, bridging the gap between Microsoft's technology and industry-specific applications. Microsoft Discovery also represents a stepping stone toward the company's broader quantum computing ambitions. Zander explained that while the platform currently uses conventional high-performance computing, it's designed with future quantum capabilities in mind. 'Science is a hero scenario for a quantum computer,' Zander said. 'If you ask yourself, what can a quantum computer do? It's extremely good at exploring complicated problem spaces that classic computers just aren't able to do.' Microsoft recently announced advancements in quantum computing with its Majorana one chip, which the company claims could potentially fit a million qubits 'in the palm of your hand' — compared to competing approaches that might require 'a football field worth of equipment.' 'General generative chemistry — we think the hero scenario for high-scale quantum computers is actually chemistry,' Zander explained. 'Because what it can do is take a small amount of data and explore a space that would take millions of years for a classic, even the largest supercomputer, to do.' This connection between today's AI-driven discovery platform and tomorrow's quantum computers reveals Microsoft's long-term strategy: building the software infrastructure and user experience today that will eventually harness the revolutionary capabilities of quantum computing when the hardware matures. Zander envisions a future where quantum computers design their own successors: 'One of the first things that I want to do when I get the quantum computer that does that kind of work is I'm going to go give it my material stack for my chip. I'm going to basically say, 'Okay, go simulate that sucker. Tell me how I build a new, a better, new version of you.'' With the powerful capabilities Microsoft Discovery offers, questions about potential misuse naturally arise. Zander emphasized that the platform incorporates Microsoft's responsible AI framework. 'We have the responsible AI program, and it's been around, actually I think we were one of the first companies to actually put that kind of framework into place,' Zander said. 'Discovery absolutely is following all responsible AI guidelines.' These safeguards include ethical use guidelines and content moderation similar to those implemented in consumer AI systems, but tailored for scientific applications. The company appears to be taking a proactive approach to identifying potential misuse scenarios. 'We already look for particular types of algorithms that could be harmful and try and flag those in content moderation style,' Zander explained. 'Again, the analogy would be very similar to what a consumer kind of bot would do.' This focus on responsible innovation reflects the dual-use nature of powerful scientific tools — the same platform that could accelerate lifesaving drug discovery could potentially be misused in other contexts. Microsoft's approach attempts to balance innovation with appropriate safeguards, though the effectiveness of these measures will only become clear as the platform is adopted more widely. Microsoft's entry into scientific AI comes at a time when the field of accelerated discovery is heating up. The ability to compress research timelines could have profound implications for addressing urgent global challenges, from drug discovery to climate change solutions. What differentiates Microsoft's approach is its focus on accessibility for non-computational scientists and its integration with the company's existing cloud infrastructure and future quantum ambitions. By allowing domain experts to directly leverage advanced computing without intermediaries, Microsoft could potentially remove a significant bottleneck in scientific progress. 'The big efficiencies are coming from places where, instead of me cramming additional domain knowledge, in this case, a scientist having learned to code, we're basically saying, 'Actually, we'll let the genetic AI do that, you can do what you do, which is use your PhD and get forward progress,'' Zander explained. This democratization of advanced computational methods could lead to a fundamental shift in how scientific research is conducted globally. Smaller labs and institutions in regions with less computational infrastructure might suddenly gain access to capabilities previously available only to elite research institutions. However, the success of Microsoft Discovery will ultimately depend on how effectively it integrates into complex existing research workflows and whether its AI agents can truly understand the nuances of specialized scientific domains. The scientific community is notoriously rigorous and skeptical of new methodologies – Microsoft will need to demonstrate consistent, reproducible results to gain widespread adoption. The platform enters private preview today, with pricing details yet to be announced. Microsoft indicates that smaller research labs will be able to access the platform through Azure, with costs structured similarly to other cloud services. 'At the end of the day, our goal, from a business perspective, is that it's all about enabling that core platform, as opposed to you having to stand up,' Zander said. 'It'll just basically ride on top of the cloud and make it much easier for people to do.' As Microsoft builds out its ambitious scientific AI platform, it positions itself at a unique juncture in the history of both computing and scientific discovery. The scientific method – a process refined over centuries – is now being augmented by some of the most advanced artificial intelligence ever created. Microsoft Discovery represents a bet that the next era of scientific breakthroughs won't come from either brilliant human minds or powerful AI systems working in isolation, but from their collaboration – where AI handles the computational heavy lifting while human scientists provide the creativity, intuition, and critical thinking that machines still lack. 'If you think about chemistry, materials sciences, materials actually impact about 98% of the world,' Zander noted. 'Everything, the desks, the displays we're using, the clothing that we're wearing. It's all materials.' The implications of accelerating discovery in these domains extend far beyond Microsoft's business interests or even the tech industry. If successful, platforms like Microsoft Discovery could fundamentally alter the pace at which humanity can innovate in response to existential challenges – from climate change to pandemic prevention. The question now isn't whether AI will transform scientific research, but how quickly and how deeply. As Zander put it: 'We need to start working faster.' In a world facing increasingly complex challenges, Microsoft is betting that the combination of human scientific expertise and agentic AI might be exactly the acceleration we need.

Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

Business Mayor

17-05-2025

Business
Business Mayor

Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Google's new AlphaEvolve shows what happens when an AI agent graduates from lab demo to production work, and you've got one of the most talented technology companies driving it. Built by Google's DeepMind, the system autonomously rewrites critical code and already pays for itself inside Google. It shattered a 56-year-old record in matrix multiplication (the core of many machine learning workloads) and clawed back 0.7% of compute capacity across the company's global data centers. Those headline feats matter, but the deeper lesson for enterprise tech leaders is how AlphaEvolve pulls them off. Its architecture – controller, fast-draft models, deep-thinking models, automated evaluators and versioned memory – illustrates the kind of production-grade plumbing that makes autonomous agents safe to deploy at scale. Google's AI technology is arguably second to none. So the trick is figuring out how to learn from it, or even using it directly. Google says an Early Access Program is coming for academic partners and that 'broader availability' is being explored, but details are thin. Until then, AlphaEvolve is a best-practice template: If you want agents that touch high-value workloads, you'll need comparable orchestration, testing and guardrails. Consider just the data center win. Google won't put a price tag on the reclaimed 0.7%, but its annual capex runs tens of billions of dollars. Even a rough estimate puts the savings in the hundreds of millions annually— enough, as independent developer Sam Witteveen noted on our recent podcast, to pay for training one of the flagship Gemini models, estimated to cost upwards of $191 million for a version like Gemini Ultra. VentureBeat was the first to report about the AlphaEvolve news earlier this week. Now we'll go deeper: how the system works, where the engineering bar really sits and the concrete steps enterprises can take to build (or buy) something comparable. AlphaEvolve runs on what is best described as an agent operating system – a distributed, asynchronous pipeline built for continuous improvement at scale. Its core pieces are a controller, a pair of large language models (Gemini Flash for breadth; Gemini Pro for depth), a versioned program-memory database and a fleet of evaluator workers, all tuned for high throughput rather than just low latency. A high-level overview of the AlphaEvolve agent structure. Source: AlphaEvolve paper. This architecture isn't conceptually new, but the execution is. 'It's just an unbelievably good execution,' Witteveen says. The AlphaEvolve paper describes the orchestrator as an 'evolutionary algorithm that gradually develops programs that improve the score on the automated evaluation metrics' (p. 3); in short, an 'autonomous pipeline of LLMs whose task is to improve an algorithm by making direct changes to the code' (p. 1). Takeaway for enterprises: If your agent plans include unsupervised runs on high-value tasks, plan for similar infrastructure: job queues, a versioned memory store, service-mesh tracing and secure sandboxing for any code the agent produces. A key element of AlphaEvolve is its rigorous evaluation framework. Every iteration proposed by the pair of LLMs is accepted or rejected based on a user-supplied 'evaluate' function that returns machine-gradable metrics. This evaluation system begins with ultrafast unit-test checks on each proposed code change – simple, automatic tests (similar to the unit tests developers already write) that verify the snippet still compiles and produces the right answers on a handful of micro-inputs – before passing the survivors on to heavier benchmarks and LLM-generated reviews. This runs in parallel, so the search stays fast and safe. In short: Let the models suggest fixes, then verify each one against tests you trust. AlphaEvolve also supports multi-objective optimization (optimizing latency and accuracy simultaneously), evolving programs that hit several metrics at once. Counter-intuitively, balancing multiple goals can improve a single target metric by encouraging more diverse solutions. Takeaway for enterprises: Production agents need deterministic scorekeepers. Whether that's unit tests, full simulators, or canary traffic analysis. Automated evaluators are both your safety net and your growth engine. Before you launch an agentic project, ask: 'Do we have a metric the agent can score itself against?' AlphaEvolve tackles every coding problem with a two-model rhythm. First, Gemini Flash fires off quick drafts, giving the system a broad set of ideas to explore. Then Gemini Pro studies those drafts in more depth and returns a smaller set of stronger candidates. Feeding both models is a lightweight 'prompt builder,' a helper script that assembles the question each model sees. It blends three kinds of context: earlier code attempts saved in a project database, any guardrails or rules the engineering team has written and relevant external material such as research papers or developer notes. With that richer backdrop, Gemini Flash can roam widely while Gemini Pro zeroes in on quality. Unlike many agent demos that tweak one function at a time, AlphaEvolve edits entire repositories. It describes each change as a standard diff block – the same patch format engineers push to GitHub – so it can touch dozens of files without losing track. Afterward, automated tests decide whether the patch sticks. Over repeated cycles, the agent's memory of success and failure grows, so it proposes better patches and wastes less compute on dead ends. Takeaway for enterprises: Let cheaper, faster models handle brainstorming, then call on a more capable model to refine the best ideas. Preserve every trial in a searchable history, because that memory speeds up later work and can be reused across teams. Accordingly, vendors are rushing to provide developers with new tooling around things like memory. Products such as OpenMemory MCP, which provides a portable memory store, and the new long- and short-term memory APIs in LlamaIndex are making this kind of persistent context almost as easy to plug in as logging. OpenAI's Codex-1 software-engineering agent, also released today, underscores the same pattern. It fires off parallel tasks inside a secure sandbox, runs unit tests and returns pull-request drafts—effectively a code-specific echo of AlphaEvolve's broader search-and-evaluate loop. AlphaEvolve's tangible wins – reclaiming 0.7% of data center capacity, cutting Gemini training kernel runtime 23%, speeding FlashAttention 32%, and simplifying TPU design – share one trait: they target domains with airtight metrics. For data center scheduling, AlphaEvolve evolved a heuristic that was evaluated using a simulator of Google's data centers based on historical workloads. For kernel optimization, the objective was to minimize actual runtime on TPU accelerators across a dataset of realistic kernel input shapes. Takeaway for enterprises: When starting your agentic AI journey, look first at workflows where 'better' is a quantifiable number your system can compute – be it latency, cost, error rate or throughput. This focus allows automated search and de-risks deployment because the agent's output (often human-readable code, as in AlphaEvolve's case) can be integrated into existing review and validation pipelines. This clarity allows the agent to self-improve and demonstrate unambiguous value. While AlphaEvolve's achievements are inspiring, Google's paper is also clear about its scope and requirements. The primary limitation is the need for an automated evaluator; problems requiring manual experimentation or 'wet-lab' feedback are currently out of scope for this specific approach. The system can consume significant compute – 'on the order of 100 compute-hours to evaluate any new solution' (AlphaEvolve paper, page 8), necessitating parallelization and careful capacity planning. Before allocating significant budget to complex agentic systems, technical leaders must ask critical questions: Machine-gradable problem? Do we have a clear, automatable metric against which the agent can score its own performance? Do we have a clear, automatable metric against which the agent can score its own performance? Compute capacity? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Can we afford the potentially compute-heavy inner loop of generation, evaluation, and refinement, especially during the development and training phase? Codebase & memory readiness? Is your codebase structured for iterative, possibly diff-based, modifications? And can you implement the instrumented memory systems vital for an agent to learn from its evolutionary history? Read More When to ignore — and believe — the AI hype cycle Takeaway for enterprises: The increasing focus on robust agent identity and access management, as seen with platforms like Frontegg, Auth0 and others, also points to the maturing infrastructure required to deploy agents that interact securely with multiple enterprise systems. AlphaEvolve's message for enterprise teams is manifold. First, your operating system around agents is now far more important than model intelligence. Google's blueprint shows three pillars that can't be skipped: Deterministic evaluators that give the agent an unambiguous score every time it makes a change. Long-running orchestration that can juggle fast 'draft' models like Gemini Flash with slower, more rigorous models – whether that's Google's stack or a framework such as LangChain's LangGraph. Persistent memory so each iteration builds on the last instead of relearning from scratch. Enterprises that already have logging, test harnesses and versioned code repositories are closer than they think. The next step is to wire those assets into a self-serve evaluation loop so multiple agent-generated solutions can compete, and only the highest-scoring patch ships. As Cisco's Anurag Dhingra, VP and GM of Enterprise Connectivity and Collaboration, told VentureBeat in an interview this week: 'It's happening, it is very, very real,' he said of enterprises using AI agents in manufacturing, warehouses, customer contact centers. 'It is not something in the future. It is happening there today.' He warned that as these agents become more pervasive, doing 'human-like work,' the strain on existing systems will be immense: 'The network traffic is going to go through the roof,' Dhingra said. Your network, budget and competitive edge will likely feel that strain before the hype cycle settles. Start proving out a contained, metric-driven use case this quarter – then scale what works. Watch the video podcast I did with developer Sam Witteveen, where we go deep on production-grade agents, and how AlphaEvolve is showing the way:

RadioShack Closes 1 of Its Last Locations But Plots a Comeback

Yahoo

11-05-2025

Business
Yahoo

RadioShack Closes 1 of Its Last Locations But Plots a Comeback

An iconic store chain that reached its peak in the 1990s has closed one of its last-ever locations in the U.S., but it's plotting a comeback there. The chain is RadioShack, and its last location in Maryland has just closed for good. According to NBC Washington, the RadioShack was located in Prince Frederick and decided to close its doors in late April 2025 after its owner Michael King died. Although it may feel like a relic from the 1980s and 1990s, the RadioShack chain still has a website and sells products online and at some locations. Its website also says it remains successful in other countries. The Maryland store was "being liquidated" with all products half off, the television station reported, adding that some customers were distraught by the end of an era. 'I'm very sad; I'll start crying,' customer Joann Faber Tyrell said to the television station. The store lasted for more than 50 years, NBC Washington reported.. According to Maryland Matters, when King died in January, his son Edward King took over the store. The site noted that Radio Shack declared bankruptcy in 2015 and the Kings kept using the name but bought their products "from other wholesalers." 'It was fun while it lasted, but it's not the same anymore,' King said to Maryland Matters. 'I know my dad realized that.' He added: "It's the end of an era." RadioShack has only six "brick-and-mortar" stores left in the U.S., according to Taste of Country, although its website lists several dozen authorized dealers that can sell the company's products but don't operate under its name. The company's website lists seven stores still carrying the RadioShack name in the U.S.; they are in Woodstock, VA; Bozeman, MT; Brodheadsville, PA; Lenoir, NC; Newland, NC; Sevierville, TN; and Layton, UT. According to Fox40, RadioShack stores started to close stores in 2010, the "victim of changing tech trends and disruptive competitors like Amazon." In 2015, when the company filed for bankruptcy, it had 1,500 stores in the United States, Fox40 reported. In 2014, according to CNN, RadioShack had a whopping 5,200 stores. According to Venture Beat, RadioShack had more than 8,000 stores in 1999, its heyday. However, after a sale, the company is now plotting a comeback, Venture Beat reported in January 2025. The company says as much on its website. Its "new owner is pledging a comeback and is showcasing 380 products at CES, the big tech trade show in Las Vegas," the Venture Beat site reported. "RadioShack is an iconic American chain of consumer electronics stores since 1921. For over a century, RadioShack has been the go-to destination for tech, offering a wide range of products from innovative gadgets to essential electronic components," the company's' website says. "Unicomer Group acquired the RadioShack franchise in El Salvador in January 1998 with the vision of expanding it throughout Central America, the Caribbean with a presence in over 20 countries. This successful partnership allowed RadioShack to become the go-to destination for any tech needs in every country it operates, continuing its legacy of offering technology products and accessories to a wider audience," it adds. "Unicomer Group, through its affiliate Global Franchising Corporation (GFC), acquired RadioShack's intellectual property assets and domains in about 70 countries around the world, including the United States and Canada, Europe, and China," the website continued. RadioShack is trying to make a comeback, the company confirmed. "RadioShack is coming back in the US with an extensive product selection that ensures our customers they will find exactly what they need to carry on with day-to-day lives or transform their home and office. Our electronics range including: music and audio equipment, gaming equipment, business traveling products, dependable computer accessories and more," the website says.

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

Business Mayor

10-05-2025

Science
Business Mayor

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Two popular approaches for customizing large language models (LLMs) for downstream tasks are fine-tuning and in-context learning (ICL). In a recent study, researchers at Google DeepMind and Stanford University explored the generalization capabilities of these two methods. They find that ICL has greater generalization ability (though it comes at a higher computation cost during inference). They also propose a novel approach to get the best of both worlds. The findings can help developers make crucial decisions when building LLM applications for their bespoke enterprise data. Fine-tuning involves taking a pre-trained LLM and further training it on a smaller, specialized dataset. This adjusts the model's internal parameters to teach it new knowledge or skills. In-context learning (ICL), on the other hand, doesn't change the model's underlying parameters. Instead, it guides the LLM by providing examples of the desired task directly within the input prompt. The model then uses these examples to figure out how to handle a new, similar query. The researchers set out to rigorously compare how well models generalize to new tasks using these two methods. They constructed 'controlled synthetic datasets of factual knowledge' with complex, self-consistent structures, like imaginary family trees or hierarchies of fictional concepts. To ensure they were testing the model's ability to learn new information, they replaced all nouns, adjectives, and verbs with nonsense terms, avoiding any overlap with the data the LLMs might have encountered during pre-training. The models were then tested on various generalization challenges. For instance, one test involved simple reversals. If a model was trained that 'femp are more dangerous than glon,' could it correctly infer that 'glon are less dangerous than femp'? Another test focused on simple syllogisms, a form of logical deduction. If told 'All glon are yomp' and 'All troff are glon,' could the model deduce that 'All troff are yomp'? They also used a more complex 'semantic structure benchmark' with a richer hierarchy of these made-up facts to test more nuanced understanding. 'Our results are focused primarily on settings about how models generalize to deductions and reversals from fine-tuning on novel knowledge structures, with clear implications for situations when fine-tuning is used to adapt a model to company-specific and proprietary information,' Andrew Lampinen, Research Scientist at Google DeepMind and lead author of the paper, told VentureBeat. To evaluate performance, the researchers fine-tuned Gemini 1.5 Flash on these datasets. For ICL, they fed the entire training dataset (or large subsets) as context to an instruction-tuned model before posing the test questions. The results consistently showed that, in data-matched settings, ICL led to better generalization than standard fine-tuning. Models using ICL were generally better at tasks like reversing relationships or making logical deductions from the provided context. Pre-trained models, without fine-tuning or ICL, performed poorly, indicating the novelty of the test data. 'One of the main trade-offs to consider is that, whilst ICL doesn't require fine-tuning (which saves the training costs), it is generally more computationally expensive with each use, since it requires providing additional context to the model,' Lampinen said. 'On the other hand, ICL tends to generalize better for the datasets and models that we evaluated.' Building on the observation that ICL excels at flexible generalization, the researchers proposed a new method to enhance fine-tuning: adding in-context inferences to fine-tuning data. The core idea is to use the LLM's own ICL capabilities to generate more diverse and richly inferred examples, and then add these augmented examples to the dataset used for fine-tuning. They explored two main data augmentation strategies: A local strategy: This approach focuses on individual pieces of information. The LLM is prompted to rephrase single sentences from the training data or draw direct inferences from them, such as generating reversals. A global strategy: The LLM is given the full training dataset as context, then prompted to generate inferences by linking a particular document or fact with the rest of the provided information, leading to a longer reasoning trace of relevant inferences. When the models were fine-tuned on these augmented datasets, the gains were significant. This augmented fine-tuning significantly improved generalization, outperforming not only standard fine-tuning but also plain ICL. 'For example, if one of the company documents says 'XYZ is an internal tool for analyzing data,' our results suggest that ICL and augmented finetuning will be more effective at enabling the model to answer related questions like 'What internal tools for data analysis exist?'' Lampinen said. This approach offers a compelling path forward for enterprises. By investing in creating these ICL-augmented datasets, developers can build fine-tuned models that exhibit stronger generalization capabilities. This can lead to more robust and reliable LLM applications that perform better on diverse, real-world inputs without incurring the continuous inference-time costs associated with large in-context prompts. 'Augmented fine-tuning will generally make the model fine-tuning process more expensive, because it requires an additional step of ICL to augment the data, followed by fine-tuning,' Lampinen said. 'Whether that additional cost is merited by the improved generalization will depend on the specific use case. However, it is computationally cheaper than applying ICL every time the model is used, when amortized over many uses of the model.' While Lampinen noted that further research is needed to see how the components they studied interact in different settings, he added that their findings indicate that developers may want to consider exploring augmented fine-tuning in cases where they see inadequate performance from fine-tuning alone. 'Ultimately, we hope this work will contribute to the science of understanding learning and generalization in foundation models, and the practicalities of adapting them to downstream tasks,' Lampinen said.

AWS report: Generative AI overtakes security in global tech budgets for 2025

Business Mayor

07-05-2025

Business
Business Mayor

AWS report: Generative AI overtakes security in global tech budgets for 2025

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Generative AI tools have surpassed cybersecurity as the top budget priority for global IT leaders heading into 2025, according to a comprehensive new study released today by Amazon Web Services. The AWS Generative AI Adoption Index, which surveyed 3,739 senior IT decision makers across nine countries, reveals that 45% of organizations plan to prioritize generative AI spending over traditional IT investments like security tools (30%) — a significant shift in corporate technology strategies as businesses race to capitalize on AI's transformative potential. 'I don't think it's cause for concern,' said Rahul Pathak, Vice President of Generative AI and AI/ML Go-to-Market at AWS, in an exclusive interview with VentureBeat. 'The way I interpret that is that customers' security remains a massive priority. What we're seeing with AI being such a major item from a budget prioritization perspective is that customers are seeing so many use cases for AI. It's really that there's a broad need to accelerate adoption of AI that's driving that particular outcome.' The extensive survey, conducted across the United States, Brazil, Canada, France, Germany, India, Japan, South Korea, and the United Kingdom, shows that generative AI adoption has reached a critical inflection point, with 90% of organizations now deploying these technologies in some capacity. More tellingly, 44% have already moved beyond the experimental phase into production deployment. IT leaders rank generative AI as their top budget priority for 2025, significantly outpacing traditional security investments. (Credit: Amazon Web Services) As AI initiatives scale across organizations, new leadership structures are emerging to manage the complexity. The report found that 60% of organizations have already appointed a dedicated AI executive, such as a Chief AI Officer (CAIO), with another 26% planning to do so by 2026. This executive-level commitment reflects growing recognition of AI's strategic importance, though the study notes that nearly one-quarter of organizations will still lack formal AI transformation strategies by 2026, suggesting potential challenges in change management. 'A thoughtful change management strategy will be critical,' the report emphasizes. 'The ideal strategy should address operating model changes, data management practices, talent pipelines, and scaling strategies.' Organizations conducted an average of 45 AI experiments in 2024, but only about 20 are expected to reach end users by 2025, highlighting persistent implementation challenges. 'For me to see over 40% going into production for something that's relatively new, I actually think is pretty rapid and high success rate from an adoption perspective,' Pathak noted. 'That said, I think customers are absolutely using AI in production at scale, and I think we want to obviously see that continue to accelerate.' The report identified talent shortages as the primary barrier to transitioning experiments into production, with 55% of respondents citing the lack of a skilled generative AI workforce as their biggest challenge. 'I'd say another big piece that's an unlock to getting into production successfully is customers really working backwards from what business objectives they're trying to drive, and then also understanding how will AI interact with their data,' Pathak told VentureBeat. 'It's really when you combine the unique insights you have about your business and your customers with AI that you can drive a differentiated business outcome.' Organizations conducted 45 AI experiments on average in 2024, but talent shortages prevent more than half from reaching production. (Credit: Amazon Web Services) To address the skills gap, organizations are pursuing dual strategies of internal training and external recruitment. The survey found that 56% of organizations have already developed generative AI training plans, with another 19% planning to do so by the end of 2025. 'For me, it's clear that it's top of mind for customers,' Pathak said regarding the talent shortage. 'It's, how do we make sure that we bring our teams along and employees along and get them to a place where they're able to maximize the opportunity.' Rather than specific technical skills, Pathak emphasized adaptability: 'I think it's more about, can you commit to sort of learning how to use AI tools so you can build them into your day-to-day workflow and keep that agility? I think that mental agility will be important for all of us.' The talent push extends beyond training to aggressive hiring, with 92% of organizations planning to recruit for roles requiring generative AI expertise in 2025. In a quarter of organizations, at least 50% of new positions will require these skills. One in four organizations will require generative AI skills for at least half of all new positions in 2025. (Credit: Amazon Web Services) The long-running debate over whether to build proprietary AI solutions or leverage existing models appears to be resolving in favor of a hybrid approach. Only 25% of organizations plan to deploy solutions developed in-house from scratch, while 58% intend to build custom applications on pre-existing models and 55% will develop applications on fine-tuned models. This represents a notable shift for industries traditionally known for custom development. The report found that 44% of financial services firms plan to use out-of-the-box solutions — a departure from their historical preference for proprietary systems. 'Many select customers are still building their own models,' Pathak explained. 'That being said, I think there's so much capability and investment that's gone into core foundation models that there are excellent starting points, and we've worked really hard to make sure customers can be confident that their data is protected. Nothing leaks into the models. Anything they do for fine-tuning or customization is private and remains their IP.' He added that companies can still leverage their proprietary knowledge while using existing foundation models: 'Customers realize that they can get the benefits of their proprietary understanding of the world with things like RAG [Retrieval-Augmented Generation] and customization and fine-tuning and model distillation.' Most organizations favor customizing existing AI models rather than building solutions from scratch. (Credit: Amazon Web Services) While generative AI investment is a global trend, the study revealed regional variations in adoption rates. The U.S. showed 44% of organizations prioritizing generative AI investments, aligning with the global average of 45%, but India (64%) and South Korea (54%) demonstrated significantly higher rates. 'We are seeing massive adoption around the world,' Pathak observed. 'I thought it was interesting that there was a relatively high amount of consistency on the global side. I think we did see in our respondents that, if you squint at it, I think we've seen India maybe slightly ahead, other parts slightly behind the average, and then kind of the U.S. right on line.' As organizations navigate the complex AI landscape, they increasingly rely on external expertise. The report found that 65% of organizations will depend on third-party vendors to some extent in 2025, with 15% planning to rely solely on vendors and 50% adopting a mixed approach combining in-house teams and external partners. 'For us, it's very much an 'and' type of relationship,' Pathak said of AWS's approach to supporting both custom and pre-built solutions. 'We want to meet customers where they are. We've got a huge partner ecosystem we've invested in from a model provider perspective, so Anthropic and Meta, Stability, Cohere, etc. We've got a big partner ecosystem of ISVs. We've got a big partner ecosystem of service providers and system integrators.' Two-thirds of organizations will rely on external expertise to deploy generative AI solutions in 2025. (Credit: Amazon Web Services) For organizations still hesitant to embrace generative AI, Pathak offered a stark warning: 'I really think customers should be leaning in, or they're going to risk getting left behind by their peers who are. The gains that AI can provide are real and significant.' He emphasized the accelerating pace of innovation in the field: 'The rate of change and the rate of improvement of AI technology and the rate of the reduction of things like the cost of inference are significant and will continue to be rapid. Things that seem impossible today will seem like old news in probably just three to six months.' This sentiment is echoed in the widespread adoption across sectors. 'We see such a rapid, such a mass breadth of adoption,' Pathak noted. 'Regulated industries, financial services, healthcare, we see governments, large enterprise, startups. The current crop of startups is almost exclusively AI-driven.' The AWS report paints a portrait of generative AI's rapid evolution from cutting-edge experiment to fundamental business infrastructure. As organizations shift budget priorities, restructure leadership teams, and race to secure AI talent, the data suggests we've reached a decisive tipping point in enterprise AI adoption. Yet amid the technological gold rush, the most successful implementations will likely come from organizations that maintain a relentless focus on business outcomes rather than technological novelty. As Pathak emphasized, 'AI is a powerful tool, but you got to start with your business objective. What are you trying to accomplish as an organization?' In the end, the companies that thrive won't necessarily be those with the biggest AI budgets or the most advanced models, but those that most effectively harness AI to solve real business problems with their unique data assets. In this new competitive landscape, the question is no longer whether to adopt AI, but how quickly organizations can transform AI experiments into tangible business advantage before their competitors do.

Latest news with #VentureBeat

Microsoft just launched an AI that discovered a new chemical in 200 hours instead of years

Google's AlphaEvolve: The AI agent that reclaimed 0.7% of Google's compute – and how to copy it

RadioShack Closes 1 of Its Last Locations But Plots a Comeback

Fine-tuning vs. in-context learning: New research guides better LLM customization for real-world tasks

AWS report: Generative AI overtakes security in global tech budgets for 2025

Get Started Now: Download the App