logo
Getting AI-Ready In A Hybrid Data World

Getting AI-Ready In A Hybrid Data World

Forbes05-06-2025
Maggie Laird is the President of Pentaho, Hitachi Vantara's data software business unit.
If AI is the future, then data is the terrain. And most businesses are hiking in flip-flops.
We all feel the urgency. Executives know that AI has the potential to drive exponential gains in productivity, insight and market leadership. But inside most organizations, the reality looks a lot less like ChatGPT magic and more like stalled pilots, costly proof-of-concepts and growing technical debt.
Gartner predicts that through 2025, 60% of AI projects will be dropped because they are 'unsupported by AI-ready data.' What's more, 63% of organizations either lack proper data management practices for AI or aren't sure whether they have them, according to Gartner.
And it's not just about volume or speed. It's about trust, structure and the ability to bridge hybrid environments without losing fidelity or context.
Adopting AI today is a lot like installing a high-performance kitchen in a house with faulty wiring and sagging floors: flashy, expensive and ultimately unsafe. That's because AI doesn't magically fix data problems—it amplifies them.
Many of the core data challenges that AI surfaces are issues companies have faced for years. Most businesses have data scattered across multiple environments: in the cloud, on-premises or a mix of both.
Take customer retention. A company may have six streams of data tied to improving it. Some of the data is structured in columns and rows. But other critical data is unstructured—buried in email, PDFs or training videos. Most companies have mountains of unstructured data that hasn't been labeled, making it inaccessible to AI or incompatible with structured data.
If a company wants AI to detect customer retention issues or trends, it needs all the relevant data to build an accurate picture. Without it, insights are skewed and incomplete—and the resulting decisions may be wrong.
Imagine setting an AI agent loose to fix customer retention problems without reliable data. Who knows what could happen? Air Canada, for example, was recently found liable for a chatbot that gave a passenger incorrect information. The airline argued that it shouldn't be held responsible for what the chatbot said, but the court disagreed.
The bar has been raised. 'Good enough' is no longer acceptable. AI is often designed to operate without humans in the loop, which means errors can go undetected.
To get the right and best results from AI, organizations need a strong data foundation—what I call 'data fitness.'
Here are four key indicators that your organization is data-ready for AI:
Most organizations don't. Data lives in the cloud, on-prem servers, Slack threads, old Excel files and more.
Being data-fit means you've cataloged what matters, labeled what's useful and can locate the most current version of any given asset—structured or unstructured. Your platform should connect across hybrid environments without needing to copy or move the data.
To streamline this, start by clearly defining the AI use case. That makes it easier to identify what data to inventory.
AI shifts decision-making to more people who don't have 'data' in their job titles.
An analyst might know to exclude 'test region X' or adjust for seasonal bias in a report. Your AI agent won't. Neither will the product manager using a low-code interface to generate pricing suggestions.
If your data isn't clean, governed and context-aware, you risk making high-speed, AI-driven decisions based on flawed inputs. That's not just bad insights—it's a serious risk.
Different problems require different speeds. Historical data might be enough to plan next quarter's staffing, but real-time data could be essential for adjusting a flash sale or spotting inventory shortfalls.
AI-ready platforms must operate across batch, real time and streaming—sometimes all within the same use case.
Getting data-fit isn't just about cleaning up for the sake of it. It's about knowing which AI use cases matter, which data is needed and how much effort it will take to make that data usable.
Sometimes, the return isn't worth it. That's okay—clarity saves time. But in many cases, once the investment is made, follow-on projects accelerate. Readiness compounds. Future efforts don't start from zero.
AI isn't a tech project any longer, it's a business imperative. But without a solid data foundation, the tools don't matter.
A home kitchen remodel inevitably involves hard, messy work—so does good data management. AI just makes it more urgent than ever.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Connected Ship Market to Reach $17.2 Billion by 2028
Connected Ship Market to Reach $17.2 Billion by 2028

Yahoo

time11 minutes ago

  • Yahoo

Connected Ship Market to Reach $17.2 Billion by 2028

Delray Beach, FL, Aug. 10, 2025 (GLOBE NEWSWIRE) -- The report "Connected Ship Market by Application (Vessel Traffic Management, Fleet Operation, Fleet Health Monitoring, Other Applications), Installation (Onboard, Onshore), Platform (Ships, Ports) & Fit (Line Fit, Retrofit, Hybrid Fit) and Global Forecast to 2028" The Connected Ship market is estimated at USD 11.3 billion in 2023 and is projected to reach USD 17.2 billion by 2028, at a CAGR of 7.7 % from 2023 to 2028. The growth can be attributed to the increasing need for connected ship for commercial applications and navy missions. The demand for enhanced safety and security in maritime industry is driving the market for Connected Ship. The maritime industry is developing rapidly by building digitalized vessels with all advanced technologies to improve the efficiency of ship in maritime. Government support and growing investments are propelling the development of advanced ships, further boosting the growth of the Connected Ship Industry. Download PDF Brochure: Major Key Players in the Connected Ship Industry: ABB (Switzerland), Emerson Electric Co. (US), Wartsila (Finland), Kongsberg Gruppen ASA (Norway), and Thales Group (France). Connected Ship Market Segmentation: The Fleet Operation segment held the largest growth rate in the Connected Ship market by application. Fleet Operation Segment to hold the highest growth rate during the forecast period. These are widely adopted technological applications to provide real time data access and fleet optimization. Additionally, they can be easily employed on commercial and defence ship. Easy deployment and enhanced efficiency to drive the market for the segment. The Connected Ship Line fit segment is expected to account for the largest share of Connected Ship by Fit in 2023. By Fit, the Connected Ship market is segmented into Line fit, Retrofit and Hybrid fit. The line fit offer the installation of connected ship technology in new ship during its construction. Line fit provides more cost-effective installation process and is more seamlessly integrated into the ship systems, as the technology is designed to work with the ship existing architecture. The Onboard segment of the Connected ship market by installation is projected to dominate the market. The Connected ship market based on the installation is segmented into Onboard and Onshore. Onboard segment to hold the highest market and Onshore segment to hold the highest growth rate during the forecast period. The rapid development in the connected ship of maritime to enhanced safety and security is driving the growth of the market. Asia Pacific is to hold the highest growth rate in 2023. The Connected Ship market industry has been studied in North America, Europe, Asia Pacific, Rest of the World. The Asia Pacific region accounts for the highest growth rate during the forecast period due to the presence of major Ship Building companies in the region to enhance the growth of the market. China is expected to show the highest growth rate and highest market share of Asia Pacific Region for Connected Ship market. Increase in rise of demand for Connected ship for commercial and Défense drives the Connected Ship market in Asia Pacific Region. Ask for Sample Report: Connected Ship Market Key Takeaways By embracing digital transformation, shipowners are increasingly adopting connected technologies to improve operational efficiency, safety, and real-time decision-making at sea. By integrating advanced systems such as vessel traffic management and fleet operations centers, naval defence and commercial operators are enhancing situational awareness and security. By leading global demand, the commercial segment—especially in cargo and container ships—is seeing rapid adoption of connected systems to optimize route planning and fuel usage. By strengthening maritime Défense capabilities, naval forces are deploying connected technologies to improve communication, combat readiness, and data-sharing across fleets. By enabling centralized monitoring and predictive maintenance, the vessel infrastructure segment is gaining momentum, helping reduce downtime and lifecycle costs. By accounting for a significant market share, Europe is emerging as a key region, driven by its strong maritime presence and emphasis on technological upgrades in shipping fleets. By advancing satellite and cloud-based communication systems, connected ships are now able to maintain continuous data exchange, even in remote ocean regions. By supporting safer and more efficient maritime operations, the rise of cybersecurity and integrated platform management systems is transforming how ships are managed and maintained. CONTACT: About MarketsandMarkets™ MarketsandMarkets™ has been recognized as one of America's Best Management Consulting Firms by Forbes, as per their recent report. MarketsandMarkets™ is a blue ocean alternative in growth consulting and program management, leveraging a man-machine offering to drive supernormal growth for progressive organizations in the B2B space. With the widest lens on emerging technologies, we are proficient in co-creating supernormal growth for clients across the globe. Today, 80% of Fortune 2000 companies rely on MarketsandMarkets, and 90 of the top 100 companies in each sector trust us to accelerate their revenue growth. With a global clientele of over 13,000 organizations, we help businesses thrive in a disruptive ecosystem. The B2B economy is witnessing the emergence of $25 trillion in new revenue streams that are replacing existing ones within this decade. We work with clients on growth programs, helping them monetize this $25 trillion opportunity through our service lines – TAM Expansion, Go-to-Market (GTM) Strategy to Execution, Market Share Gain, Account Enablement, and Thought Leadership Marketing. Built on the 'GIVE Growth' principle, we collaborate with several Forbes Global 2000 B2B companies to keep them future-ready. Our insights and strategies are powered by industry experts, cutting-edge AI, and our Market Intelligence Cloud, KnowledgeStore™, which integrates research and provides ecosystem-wide visibility into revenue shifts. To find out more, visit or follow us on Twitter, LinkedIn and Facebook. Contact: Mr. Rohan Salgarkar MarketsandMarkets™ INC. 1615 South Congress Ave. Suite 103, Delray Beach, FL 33445, USA: +1-888-600-6441 Email: sales@ Visit Our Website: in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Oil Falls Amid Easing Supply-Disruption Concerns
Oil Falls Amid Easing Supply-Disruption Concerns

Wall Street Journal

time13 minutes ago

  • Wall Street Journal

Oil Falls Amid Easing Supply-Disruption Concerns

2338 GMT — Oil falls in the early Asian session amid easing concerns about supply disruptions. U.S. President Trump said Friday he would meet with Russian President Putin in Alaska on Aug. 15 after Putin presented the Trump administration with a proposal for a cease-fire in Ukraine. Geopolitical risks have eased, ANZ Research analysts say in a research report, citing the news. Also, U.S. tariffs have officially commenced, raising worries over weaker economic activity and hence, demand for crude oil, the analysts add. Front-month WTI crude oil futures are down 0.4% at $63.64/bbl; front-month Brent crude oil futures are 0.3% lower at $66.37/bbl. (

Will Reinforcement Learning Take Us To AGI?
Will Reinforcement Learning Take Us To AGI?

Forbes

time13 minutes ago

  • Forbes

Will Reinforcement Learning Take Us To AGI?

There aren't many truly new ideas in artificial intelligence. More often, breakthroughs in AI happen when concepts that have existed for years suddenly take on new power because underlying technology inputs—in particular, raw computing power—finally catch up to unlock those concepts' full potential. Famously, Geoff Hinton and a small group of collaborators devoted themselves tirelessly to neural networks starting in the early 1970s. For decades, the technology didn't really work and the outside world paid little attention. It was not until the early 2010s—thanks to the arrival of sufficiently powerful Nvidia GPUs and internet-scale training data—that the potential of neural networks was finally unleashed for all to see. In 2024, more than half a century after he began working on neural networks, Hinton was awarded the Nobel Prize for pioneering the field of modern AI. Reinforcement learning has followed a similar arc. Richard Sutton and Andrew Barto, the fathers of modern reinforcement learning, laid down the foundations of the field starting in the 1970s. Even before Sutton and Barto began their work, the basic principles underlying reinforcement learning—in short, learning by trial and error based on positive and negative feedback—had been developed by behavioral psychologists and animal researchers going back to the early twentieth century. Yet in just the past year, advances in reinforcement learning (RL) have taken on newfound importance and urgency in the world of AI. It has become increasingly clear that the next leap in AI capabilities will be driven by RL. If artificial general intelligence (AGI) is in fact around the corner, reinforcement learning will play a central role in getting us there. Just a few years ago, when ChatGPT's launch ushered in the era of generative AI, almost no one would have predicted this. Deep questions remain unanswered about reinforcement learning's capabilities and its limits. No field in AI is moving more quickly today than RL. It has never been more important to understand this technology, its history and its Learning 101 The basic principles of reinforcement learning have remained consistent since Sutton and Barto established the field in the 1970s. The essence of RL is to learn by interacting with the world and seeing what happens. It is a universal and foundational form of learning; every human and animal does it. In the context of artificial intelligence, a reinforcement learning system consists of an agent interacting with an environment. RL agents are not given direct instructions or answers by humans; instead, they learn through trial and error. When an agent takes an action in an environment, it receives a reward signal from the environment, indicating that the action produced either a positive or a negative outcome. The agent's goal is to adjust its behavior to maximize positive rewards and minimize negative rewards over time. How does the agent decide which actions to take? Every agent acts according to a policy, which can be understood as the formula or calculus that determines the agent's action based on the particular state of the environment. A policy can be a simple set of rules, or even pure randomness, or it can be represented by a far more complex system, like a deep neural network. One final concept that is important to understand in RL, closely related to the reward signal, is the value function. The value function is the agent's estimate of how favorable a given state of the environment will be (that is, how many positive and negative rewards it will lead to) over the long run. Whereas reward signals are immediate pieces of feedback that come from the environment based on current conditions, the value function is the agent's own learned estimate of how things will play out in the long term. The entire purpose of value functions is to estimate reward signals, but unlike reward signals, value functions enable agents to reason and plan over longer time horizons. For instance, value functions can incentivize actions even when they lead to negative near-term rewards because the long-term benefit is estimated to be worth it. When RL agents learn, they do so in one of three ways: by updating their policy, updating their value function, or updating both together. A brief example will help make these concepts concrete. Imagine applying reinforcement learning to the game of chess. In this case, the agent is an AI chess player. The environment is the chess board, with any given configuration of chess pieces representing a state of that environment. The agent's policy is the function (whether a simple set of rules, or a decision tree, or a neural network, or something else) that determines which move to make based on the current board state. The reward signal is simple: positive when the agent wins a game, negative when it loses a game. The agent's value function is its learned estimate of how favorable or unfavorable any given board position is—that is, how likely the position is to lead to a win or a loss. As the agent plays more games, strategies that lead to wins will be positively reinforced and strategies that lead to losses will be negatively reinforced via updates to the agent's policy and value function. Gradually, the AI system will become a stronger chess player. In the twenty-first century, one organization has championed and advanced the field of reinforcement learning more than any other: DeepMind. Founded in 2010 as a startup devoted to solving artificial intelligence and then acquired by Google in 2014 for ~$600 million, London-based DeepMind made a big early bet on reinforcement learning as the most promising path forward in AI. And the bet paid off. The second half of the 2010s were triumphant years for the field of reinforcement learning. In 2016, DeepMind's AlphaGo became the first AI system to defeat a human world champion at the ancient Chinese game of Go, a feat that many AI experts had believed was impossible. In 2017, DeepMind debuted AlphaZero, which taught itself Go, chess and Japanese chess entirely via self-play and bested every other AI and human competitor in those games. And in 2019, DeepMind unveiled AlphaStar, which mastered the video game StarCraft—an even more complex environment than Go given the vast action space, imperfect information, numerous agents and real-time gameplay. AlphaGo, AlphaZero, AlphaStar—reinforcement learning powered each of these landmark achievements. As the 2010s drew to a close, RL seemed poised to dominate the coming generation of artificial intelligence breakthroughs, with DeepMind leading the way. But that's not what happened. Right around that time, a new AI paradigm unexpectedly burst into the spotlight: self-supervised learning for autoregressive language models. In 2019, a small nonprofit research lab named OpenAI released a model named GPT-2 that demonstrated surprisingly powerful general-purpose language capabilities. The following summer, OpenAI debuted GPT-3, whose astonishing abilities represented a massive leap in performance from GPT-2 and took the AI world by storm. In 2022 came ChatGPT. In short order, every AI organization in the world reoriented its research focus to prioritize large language models and generative AI. These large language models (LLMs) were based on the transformer architecture and made possible by a strategy of aggressive scaling. They were trained on unlabeled datasets that were bigger than any previous AI training data corpus—essentially the entire internet—and were scaled up to unprecedented model sizes. (GPT-2 was considered mind-bogglingly large at 1.5 billion parameters; one year later, GPT-3 debuted at 175 billion parameters.) Reinforcement learning fell out of fashion for half a decade. A widely repeated narrative during the early 2020s was that DeepMind had seriously misread technology trends by committing itself to reinforcement learning and missing the boat on generative AI. Yet today, reinforcement learning has reemerged as the hottest field within AI. What happened? In short, AI researchers discovered that applying reinforcement learning to generative AI models was a killer combination. Starting with a base LLM and then applying reinforcement learning on top of it meant that, for the first time, RL could natively operate with the gift of language and broad knowledge about the world. Pretrained foundation models represented a powerful base on which RL could work its magic. The results have been dazzling—and we are just getting Meets LLMs What does it mean, exactly, to combine reinforcement learning with large language models? A key insight to start with is that the core concepts of RL can be mapped directly and elegantly to the world of LLMs. In this mapping, the LLM itself is the agent. The environment is the full digital context in which the LLM is operating, including the prompts it is presented with, its context window, and any tools and external information it has access to. The model's weights represent the policy: they determine how the agent acts when presented with any particular state of the environment. Acting, in this context, means generating tokens. What about the reward signal and the value function? Defining a reward signal for LLMs is where things get interesting and complicated. It is this topic, more than any other, that will determine how far RL can take us on the path to superintelligence. The first major application of RL to LLMs was reinforcement learning from human feedback, or RLHF. The frontiers of AI research have since advanced to more cutting-edge methods of combining RL and LLMs, but RLHF represents an important step on the journey, and it provides a concrete illustration of the concept of reward signals for LLMs. RLHF was invented by DeepMind and OpenAI researchers back in 2017. (As a side note, given today's competitive and closed research environment, it is remarkable to remember that OpenAI and DeepMind used to conduct and publish foundational research together.) RLHF's true coming-out party, though, was ChatGPT. When ChatGPT debuted in November 2022, the underlying AI model on which it was based was not new; it had already been publicly available for many months. The reason that ChatGPT became an overnight success was that it was approachable, easy to talk to, helpful, good at following directions. The technology that made this possible was RLHF. In a nutshell, RLHF is a method to adapt LLMs' style and tone to be consistent with human-expressed preferences, whatever those preferences may be. RLHF is most often used to make LLMs 'helpful, harmless and honest,' but it can equally be used to make them more flirtatious, or rude, or sarcastic, or progressive, or conservative. How does RLHF work? The key ingredient in RLHF is 'preference data' generated by human subjects. Specifically, humans are asked to consider two responses from the model for a given prompt and to select which one of the two responses they prefer. This pairwise preference data is used to train a separate model, known as the reward model, which learns to produce a numerical rating of how desirable or undesirable any given output from the main model is. This is where RL comes in. Now that we have a reward signal, an RL algorithm can be used to fine-tune the main model—in other words, the RL agent—so that it generates responses that maximize the reward model's scores. In this way, the main model comes to incorporate the style and values reflected in the human-generated preference data. Circling back to reward signals and LLMs: in the case of RLHF, as we have seen, the reward signal comes directly from humans and human-generated preference data, which is then distilled into a reward model. What if we want to use RL to give LLMs powerful new capabilities beyond simply adhering to human preferences?The Next Frontier The most important development in AI over the past year has been language models' improved ability to engage in reasoning. What exactly does it mean for an AI model to 'reason'? Unlike first-generation LLMs, which respond to prompts using next-token prediction with no planning or reflection, reasoning models spend time thinking before producing a response. These models think by generating 'chains of thought,' enabling them to systematically break down a given task into smaller steps and then work through each step in order to arrive at a well-thought-through answer. They also know how and when to use external tools—like a calculator, a code interpreter or the internet—to help solve problems. The world's first reasoning model, OpenAI's o1, debuted less than a year ago. A few months later, China-based DeepSeek captured world headlines when it released its own reasoning model, R1, that was near parity with o1, fully open and trained using far less compute. The secret sauce that gives AI models the ability to reason is reinforcement learning—specifically, an approach to RL known as reinforcement learning from verifiable rewards (RLVR). Like RLHF, RLVR entails taking a base model and fine-tuning it using RL. But the source of the reward signal, and therefore the types of new capabilities that the AI gains, are quite different. As its name suggests, RLVR improves AI models by training them on problems whose answers can be objectively verified—most commonly, math or coding tasks. First, a model is presented with such a task—say, a challenging math problem—and prompted to generate a chain of thought in order to solve the problem. The final answer that the model produces is then formally determined to be either correct or incorrect. (If it's a math question, the final answer can be run through a calculator or a more complex symbolic math engine; if it's a coding task, the model's code can be executed in a sandboxed environment.) Because we now have a reward signal—positive if the final answer is correct, negative if it is incorrect—RL can be used to positively reinforce the types of chains of thought that lead to correct answers and to discourage those that lead to incorrect answers. The end result is a model that is far more effective at reasoning: that is, at accurately working through complex multi-step problems and landing on the correct solution. This new generation of reasoning models has demonstrated astonishing capabilities in math competitions like the International Math Olympiad and on logical tests like the ARC-AGI benchmark. So—is AGI right around the corner? Not necessarily. A few big-picture questions about reinforcement learning and language models remain unanswered and loom large. These questions inspire lively debate and widely varying opinions in the world of artificial intelligence today. Their answers will determine how powerful AI gets in the coming months.A Few Big Unanswered Questions Today's cutting-edge RL methods rely on problems whose answers can be objectively verified as either right or wrong. Unsurprisingly, then, RL has proven exceptional at producing AI systems that are world-class at math, coding, logic puzzles and standardized tests. But what about the many problems in the world that don't have easily verifiable answers? In a provocative essay titled 'The Problem With Reasoners', Aidan McLaughlin elegantly articulates this point: 'Remember that reasoning models use RL, RL works best in domains with clear/frequent reward, and most domains lack clear/frequent reward.' McLaughlin argues that most domains that humans actually care about are not easily verifiable, and we will therefore have little success using RL to make AI superhuman at them: for instance, giving career advice, managing a team, understanding social trends, writing original poetry, investing in startups. A few counterarguments to this critique are worth considering. The first centers on the concepts of transfer learning and generalizability. Transfer learning is the idea that models trained in one area can transfer those learnings to improve in other areas. Proponents of transfer learning in RL argue that, even if reasoning models are trained only on math and coding problems, this will endow them with broad-based reasoning skills that will generalize beyond those domains and enhance their ability to tackle all sorts of cognitive tasks. 'Learning to think in a structured way, breaking topics down into smaller subtopics, understanding cause and effect, tracing the connections between different ideas—these skills should be broadly helpful across problem spaces,' said Dhruv Batra, cofounder/chief scientist at Yutori and former senior AI researcher at Meta. 'This is not so different from how we approach education for humans: we teach kids basic numeracy and literacy in the hopes of creating a generally well-informed and well-reasoning population.' Put more strongly: if you can solve math, you can solve anything. Anything that can be done with a computer, after all, ultimately boils down to math. It is an intriguing hypothesis. But to date, there is no conclusive evidence that RL endows LLMs with reasoning capabilities that generalize beyond easily verifiable domains like math and coding. It is no coincidence that the most important advances in AI in recent months—both from a research and a commercial perspective—have occurred in precisely these two fields. If RL can only give AI models superhuman powers in domains that can be easily verified, this represents a serious limit to how far RL can advance the frontiers of AI's capabilities. AI systems that can write code or do mathematics as well as or better than humans are undoubtedly valuable. But true general-purpose intelligence consists of much more than this. Let us consider another counterpoint on this topic, though: what if verification systems can in fact be built for many (or even all) domains, even when those domains are not as clearly deterministic and checkable as a math problem? Might it be possible to develop a verification system that can reliably determine whether a novel, or a government policy, or a piece of career advice, is 'good' or 'successful' and therefore should be positively reinforced? This line of thinking quickly leads us into borderline philosophical considerations. In many fields, determining the 'goodness' or 'badness' of a given outcome would seem to involve value judgments that are irreducibly subjective, whether on ethical or aesthetic grounds. For instance, is it possible to determine that one public policy outcome (say, reducing the federal deficit) is objectively superior to another (say, expanding a certain social welfare program)? Is it possible to objectively identify that a painting or a poem is or is not 'good'? What makes art 'good'? Is beauty not, after all, in the eye of the beholder? Certain domains simply do not possess a 'ground truth' to learn from, but rather only differing values and tradeoffs to be weighed. Even in such domains, though, another possible approach exists. What if we could train an AI via many examples to instinctively identify 'good' and 'bad' outcomes, even if we can't formally define them, and then have that AI serve as our verifier? As Julien Launay, CEO/cofounder of RL startup Adaptive ML, put it: 'In bridging the gap from verifiable to non-verifiable domains, we are essentially looking for a compiler for natural language…but we already have built this compiler: that's what large language models are.' This approach is often referred to as reinforcement learning from AI feedback (RLAIF) or 'LLM-as-a-Judge.' Some researchers believe it is the key to making verification possible across more domains. But it is not clear how far LLM-as-a-Judge can take us. The reason that reinforcement learning from verifiable rewards has led to such incisive reasoning capabilities in LLMs in the first place is that it relies on formal verification methods: correct and incorrect answers exist to be discovered and learned. LLM-as-a-Judge seems to bring us back to a regime more closely resembling RLHF, whereby AI models can be fine-tuned to internalize whatever preferences and value judgments are contained in the training data, arbitrary though they may be. This merely punts the problem of verifying subjective domains to the training data, where it may remain as unsolvable as ever. We can say this much for sure: to date, neither OpenAI nor Anthropic nor any other frontier lab has debuted an RL-based system with superhuman capabilities in writing novels, or advising governments, or starting companies, or any other activity that lacks obvious verifiability. This doesn't mean that the frontier labs are not making progress on the problem. Indeed, just last month, leading OpenAI researcher Noam Brown shared on X: 'We developed new techniques that make LLMs a lot better at hard-to-verify tasks.' Rumors have even begun to circulate that OpenAI has developed a so-called 'universal verifier,' which can provide an accurate reward signal in any domain. It is hard to imagine how such a universal verifier would work; no concrete details have been shared publicly. Time will tell how powerful these new techniques is important to remember that we are still in the earliest innings of the reinforcement learning era in generative AI. We have just begun to scale RL. The total amount of compute and training data devoted to reinforcement learning remains modest compared to the level of resources spent on pretraining foundation models. This chart from a recent OpenAI presentation speaks volumes: At this very moment, AI organizations are preparing to deploy vast sums to scale up their reinforcement learning efforts as quickly as they can. As the chart above depicts, RL is about to transition from a relatively minor component of AI training budgets to the main focus. What does it mean to scale RL? 'Perhaps the most important ingredient when scaling RL is the environments—in other words, the settings in which you unleash the AI to explore and learn,' said Stanford AI researcher Andy Zhang. 'In addition to sheer quantity of environments, we need higher-quality environments, especially as model capabilities improve. This will require thoughtful design and implementation of environments to ensure diversity and goldilocks difficulty and to avoid reward hacking and broken tasks.' When xAI debuted its new frontier model Grok 4 last month, it announced that it had devoted 'over an order of magnitude more compute' to reinforcement learning than it had with previous models. We have many more orders of magnitude to go. Today's RL-powered models, while powerful, face shortcomings. The unsolved challenge of difficult-to-verify domains, discussed above, is one. Another critique is known as elicitation: the hypothesis that reinforcement learning doesn't actually endow AI models with greater intelligence but rather just elicits capabilities that the base model already possessed. Yet another obstacle that RL faces is its inherent sample inefficiency compared to other AI paradigms: RL agents must do a tremendous amount of work to receive a single bit of feedback. This 'reward sparsity' has made RL impracticable to deploy in many contexts. It is possible that scale will be a tidal wave that washes all of these concerns away. If there is one principle that has defined frontier AI in recent years, after all, it is this: nothing matters more than scale. When OpenAI scaled from GPT-2 to GPT-3 to GPT-4 between 2019 and 2023, the models' performance gains and emergent capabilities were astonishing, far exceeding the community's expectations. At every step, skeptics identified shortcomings and failure modes with these models, claiming that they revealed fundamental weaknesses in the technology paradigm and predicting that progress would soon hit a wall. Instead, the next generation of models would blow past these shortcomings, advancing the frontier by leaps and bounds and demonstrating new capabilities that critics had previously argued were impossible. The world's leading AI players are betting that a similar pattern will play out with reinforcement learning. If recent history is any guide, it is a good bet to make. But it is important to remember that AI 'scaling laws'—which predict that AI performance increases as data, compute and model size increase—are not actually laws in any sense of that word. They are empirical observations that for a time proved reliable and predictive for pretraining language models and that have been preliminarily demonstrated in other data modalities. There is no formal guarantee that scaling laws will always hold in AI, nor how long they will last, nor how steep their slope will be. The truth is that no one knows for sure what will happen when we massively scale up RL. But we are all about to find tuned for our follow-up article on this topic—or feel free to reach out directly to discuss!Looking Forward Reinforcement learning represents a compelling approach to building machine intelligence for one profound reason: it is not bound by human competence or imagination. Training an AI model on vast troves of labeled data (supervised learning) will make the model exceptional at understanding those labels, but its knowledge will be limited to the annotated data that humans have prepared. Training an AI model on the entire internet (self-supervised learning) will make the model exceptional at understanding the totality of humanity's existing knowledge, but it is not clear that this will enable it to generate novel insights that go beyond what humans have already put forth. Reinforcement learning faces no such ceiling. It does not take its cues from existing human data. An RL agent learns for itself, from first principles, through first-hand experience. AlphaGo's 'Move 37' serves as the archetypal example here. In one of its matches against human world champion Lee Sedol, AlphaGo played a move that violated thousands of years of accumulated human wisdom about Go strategy. Most observers assumed it was a miscue. Instead, Move 37 proved to be a brilliant play that gave AlphaGo a decisive advantage over Sedol. The move taught humanity something new about the game of Go. It has forever changed the way that human experts play the game. The ultimate promise of artificial intelligence is not simply to replicate human intelligence. Rather, it is to unlock new forms of intelligence that are radically different from our own—forms of intelligence that can come up with ideas that we never would have come up with, make discoveries that we never would have made, help us see the world in previously unimaginable ways. We have yet to see a 'Move 37 moment' in the world of generative AI. It may be a matter of weeks or months—or it may never happen. Watch this space.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store