
Mind The (Confidence) Gap: Why AI Testing Is Crucial To Success
Scott Clark is the Cofounder and CEO of Distributional, which helps make AI safe, secure and reliable through adaptive testing.
Generative AI is one of the most exciting opportunities today, and 2025 is shaping up to be a pivotal year, with 67% of early adopter organizations planning to increase their investments. Executives are eager to see returns, but many GenAI projects are dying on the vine. According to Gartner, nearly 1 in 3 of these GenAI projects will be abandoned after proof of concept by the end of 2025.
What's halting progress? While it is easy to cite issues in performance, the real problem is actually much larger and more complex. It's the lack of confidence in the behavior of AI over time.
Closing this confidence gap will require a new approach to proactively and adaptively test these applications to ensure desired behavior. Taking an adaptive, behavioral approach to testing AI can ensure enterprises have full confidence in their production AI applications while accelerating their pace of innovation.
Overfocusing on performance is something I can relate to. My first company, SigOpt, was focused on helping some of the most sophisticated organizations in the world optimize their complex traditional AI and ML models using Bayesian optimization so that they could squeeze the most performance possible out of them.
After Intel acquired SigOpt in 2020, I led the AI and High Performance Computing team for Intel's Supercomputing Group. It was there that I realized that despite SigOpt's success, I was focusing on the wrong problem. Worse, people were starting to make the same mistake I did with these more powerful GenAI systems.
People don't stay up at night wishing they could overfit an eval function by another half a percent. They're worried their model will go off the rails and cause harm to their business by not behaving as desired.
The real pain with enterprise AI isn't performance—it's the AI confidence gap.
No matter how many manual spot checks or performance benchmarks are run, organizations simply don't have the confidence that their AI applications will actually behave as desired in production, preventing them from pursuing higher-value AI use cases.
Testing has provided confidence in software behavior for decades, so why does it break down with AI?
In traditional testing, code is static and engineers look for bugs by asserting that a specific input always returns a specific output. But AI applications are non-deterministic—the same input can return many different potential outputs. For testing AI, this means they can't just rely on fixed datasets tied to specific outputs (also known as golden datasets) and instead need to be able to analyze the distribution of outputs and behaviors of the app.
Plus, AI applications are constantly shifting, with changes in usage, updates to prompts, new underlying models or shifting dependencies on upstream APIs, pipelines and services. Proper AI testing needs to adaptively keep up with these changes and inherent model non-stationarity.
AI applications are only getting more complex, especially with agents. Undesired behavior can propagate throughout the interconnected systems and be challenging to trace back to the original source. AI testing needs to look at the entire application, including intermediate data, not just performance metrics on input/output pairs.
Traditional testing isn't capable of addressing these characteristics of AI applications. This is why I often hear from customers that AI is impossible to test, so they push it live and hope for the best.
If traditional testing doesn't work, what are teams doing instead?
During the development process, teams often rely on vibe checks to get an app to 'good enough' performance. Vibe checks are inherently subjective and don't scale. They also mask behavioral issues rather than guide teams to the understanding required to quantify and resolve underlying issues.
As teams mature, they may define thresholds on performance using evals. But performance metrics alone will never capture the full picture and will miss more subtle shifts in behavior. When there is a performance drop, teams don't have enough information to understand what is causing the change and resolve the issues. These solutions are too incomplete, limited and static for AI systems.
Instead, AI testing needs to focus on the entirety of app behavior, not just performance. By taking into account distributions of all behavioral properties, users get a more complete definition of desired behavior over time. By identifying macro behavioral changes and tying those to the testable quantitative properties that are causing the changes, they can further refine this definition over time. Ultimately, this leads to an adaptive testing methodology to understand where and how behavior shifts so they can catch and resolve underlying issues.
With AI, model and production usage will always change. Unlike traditional software, teams are not climbing a static peak toward test coverage. Instead, they need to adaptively surf a wave of behavioral test depth as business needs and usage change over time to stay confident.
Now is a fun time to be in AI. The opportunity feels limitless, especially in the enterprise.
But there is a rub. These AI systems can be either powerfully good or bad. As I mentioned earlier, nearly 1 in 3 AI applications are abandoned after development, and even fewer make it into production. In production, these systems can achieve world-class performance one day and harmfully poor behavior the next—exposing companies to financial, reputational or, increasingly, regulatory risk.
An adaptive approach to AI testing helps you define and ensure desired behavior continuously, giving you the confidence to productionalize more and fully realize AI's potential.
How can you get started? First, collect usage data so you can be ready to test even before production. Next, implement an adaptive testing solution to understand and quantify the desired behavior. And test these applications for behavioral shifts in production, not just during development.
I'm excited for 2025 to be the year of productionalizing and getting real business value out of AI through AI testing.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles

Business Insider
18 minutes ago
- Business Insider
Klarna CEO warns AI may cause a recession as the technology comes for white-collar jobs
Speaking on The Times Tech podcast, Sebastian Siemiatkowski said there would be "an implication for white-collar jobs," which he said "usually leads to at least a recession in the short term." "Unfortunately, I don't see how we could avoid that, with what's happening from a technology perspective," he continued. Siemiatkowski, who has long been candid about his belief that AI will come for human jobs, added that AI had played a key role in "efficiency gains" at Klarna and that the firm's workforce had shrunk from about 5,500 to 3,000 people in the last two years as a result. It's not the first time the exec and Klarna have made headlines along these lines. In February 2024, Klarna boasted that its OpenAI-powered AI assistant was doing the work of 700 full-time customer service agents. The company, most famous for its "buy now, pay later" service, was one of the first firms to partner with Sam Altman's company. Later that year, Siemiatkowski told Bloomberg TV that he believed AI was already capable of doing "all of the jobs" that humans do and that Klarna had enacted a hiring freeze since 2023 as it looked to slim down and focus on adopting the technology. However, Siemiatkowski has since dialed back his all-in stance on AI, telling an audience at the firm's Stockholm headquarters in May that his AI-driven customer service cost-cutting efforts had gone too far and that Klarna was planning to now recruit, according to Bloomberg. "From a brand perspective, a company perspective, I just think it's so critical that you are clear to your customer that there will be always a human if you want," he said. In the interview with The Times, Siemiatkowski said he felt that many people in the tech industry, particularly CEOs, tended to "downplay the consequences of AI on jobs, white-collar jobs in particular." "I don't want to be one of them," he said. "I want to be honest, I want to be fair, and I want to tell what I see so that society can start taking preparations." Some of the top leaders in AI, however, have been ringing the alarm lately, too. Anthropic's leadership has been particularly outspoken about the threat AI poses to the human labor market. The company's CEO, Dario Amodei, recently said that AI may eliminate 50% of entry-level white-collar jobs within the next five years. "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming," Amodei said. "I don't think this is on people's radar." Similarly, his colleague, Mike Krieger, Anthropic's chief product officer, said he is hesitant to hire entry-level software engineers over more experienced ones who can also leverage AI tools. The silver lining is that AI also brings the promise of better and more fulfilling work, Krieger said. Humans, he said, should focus on "coming up with the right ideas, doing the right user interaction design, figuring out how to delegate work correctly, and then figuring out how to review things at scale — and that's probably some combination of maybe a comeback of some static analysis or maybe AI-driven analysis tools of what was actually produced."
Yahoo
an hour ago
- Yahoo
Cathie Wood sells $22.8 million of hot stock near all-time highs
Cathie Wood sells $22.8 million of hot stock near all-time highs originally appeared on TheStreet. Cathie Wood has long been aggressive in hunting tech stocks that she believes will have a 'disruptive' impact on the future world. However, she sometimes sells a stock when it is high to secure gains. In the past week, the head of Ark Investment Management sold a popular AI stock that has surged nearly 70% year-to-date. 💵💰Don't miss the move: Subscribe to TheStreet's free daily newsletter 💰💵 Cathie Wood's investments have had a volatile ride this year, swinging from strong gains to sharp losses, and now back to outperforming the broader market. In January and February, the Ark funds rallied as investors bet on the Trump administration's potential deregulation that could benefit Wood's tech bets. But the funds stumbled in the following weeks, underperforming sharply as several of its top holdings —especially Tesla, its largest position — declined amid macroeconomic and trade policy uncertainties. Now, the fund is regaining momentum. As of June 6, the flagship Ark Innovation ETF () is up 6.11% year-to-date, outpacing the S&P 500's 2.02% gain. Wood gained a remarkable 153% in 2020, which helped build her reputation and attract loyal investors. Still, her long-term performance has made many others skeptical of her aggressive style. As of June 6, Ark Innovation ETF, with $5 billion under management, has delivered a five-year annualized return of negative 0.5%. In comparison, the S&P 500 has an annualized return of 15.18% over the same period. Wood's investment strategy is straightforward: Her Ark ETFs typically buy shares in emerging high-tech companies in fields such as artificial intelligence, blockchain, biomedical technology and robotics. Wood says these companies have the potential to reshape industries, but their volatility leads to major fluctuations in Ark funds' Ark Innovation ETF wiped out $7 billion in investor wealth over the 10 years ending in 2024, according to an analysis by Morningstar's analyst Amy Arnott. That made it the third-biggest wealth destroyer among mutual funds and ETFs in Arnott's ranking. Wood said the U.S. is coming out of a three-year 'rolling recession' and heading into a productivity-led recovery that could trigger a broader bull market. In a letter to investors published on April 30, she dismissed predictions of a recession dragging into 2026, as she expects "more clarity on tariffs, taxes, regulations, and interest rates over the next three to six months." "If the current tariff turmoil results in freer trade, as tariffs and non-tariff barriers come down in tandem with declines in other taxes, regulations, and interest rates, then real GDP growth and productivity should surprise on the high side of expectations at some point during the second half of this year," she wrote. She also struck an optimistic tone for tech stocks. "During the current turbulent transition in the US, we think consumers and businesses are likely to accelerate the shift to technologically enabled innovation platforms including artificial intelligence, robotics, energy storage, blockchain technology, and multiomics sequencing," she said. But not everyone shares Wood's bullish outlook. Her flagship Ark Innovation ETF has seen $2.23 billion in net outflows over the past year through June 5, including nearly $154 million in the last month alone, according to ETF research firm VettaFi. From June 2 to June 5, Wood's Ark funds sold 179,846 shares of Palantir Technologies () , which was valued at roughly $22.8 million. Palantir is known for providing AI-driven data analytics software to the U.S. government, military, and commercial clients worldwide, including JPMorgan Chase, Airbus, and Merck. The company reported stronger-than-expected first-quarter revenue in early May and raised its full-year outlook as demand for AI tools increased. 'We are delivering the operating system for the modern enterprise in the era of AI,' CEO Alex Karp said. While many tech stocks have struggled this year, Palantir has stood out. Its shares are up roughly 69% in 2025 and just hit a record close of $133.17 on June of the recent momentum comes from its government work. Back in May 2024, Palantir won a $480 million, five-year U.S. Army contract to build its Maven Smart System, which is a battlefield AI prototype. Last month, the Defense Department modified the contract, increasing the licensing ceiling from $480 million to $1.275 billion. Palantir's Foundry platform has been adopted by at least four federal agencies, including the Department of Homeland Security and the Department of Health and Human Services, according to a New York Times report published May 30. Fannie Mae also announced a partnership with Palantir in May to work on AI-based fraud detection. However, the New York Times article also raised concerns about the company's relationship with the Trump administration, alleging that the U.S. president could use Palantir's technology to target immigrants and political opponents. The article also claimed that some Palantir employees felt uncomfortable with the company's decision to work with the Trump administration and that it "risks becoming the face of Mr. Trump's political agenda." Palantir responded in a June 3 post on X, denying the accusations. More Palantir Palantir gets great news from the Pentagon Wall Street veteran doubles down on Palantir Palantir bull sends message after CEO joins Trump for Saudi visit 'The recently published article by The New York Times is blatantly untrue,' the company wrote. 'Palantir never collects data to unlawfully surveil Americans.' Palantir remains a core position for Wood even after recent trims. The stock is now the 9th largest holding in the ARK Innovation ETF, accounting for 4.54%. Wood's latest trades in the past week include buying shares of Advanced Micro Devices () , () , Guardant Health () and Veracyte () . At the same time, she trimmed positions in Tesla () , Roblox () , Robinhood () , and Meta Platforms () .Cathie Wood sells $22.8 million of hot stock near all-time highs first appeared on TheStreet on Jun 8, 2025 This story was originally reported by TheStreet on Jun 8, 2025, where it first appeared.


Forbes
an hour ago
- Forbes
Why We Need Global Prosocial AI Governance — Now
Abstract image of a person's profile symbolically composed of dials of different sizes. Concept man, ... More time, space The artificial intelligence revolution isn't coming — it's here. But unlike previous technological waves, AI's transformative power is being concentrated in the hands of remarkably few players, creating global imbalances that threaten to entrench existing inequalities for generations. As AI systems increasingly shape our economies, societies, and daily lives, we face a critical choice: Will we allow narrow market forces and geopolitical power dynamics to dictate AI's development, or will we proactively steer this technology toward benefiting humanity as a whole? It is late to set the stage for global prosocial AI governance, but it is not too late – yet. Before examining governance frameworks, we must confront an uncomfortable truth: the AI revolution is built on a foundation of extreme market concentration that makes Big Tech's dominance look almost quaint by comparison. Nvidia controls approximately 80 percent of revenues and shipments for datacenter GPU computing, the essential infrastructure powering modern AI systems. This isn't just market leadership — it's approaching technological hegemony. The implications extend far beyond corporate balance sheets. Collectively, the global south is home to just over 1 percent of the world's top computers, and Africa just 0.04 percent. Meanwhile, the U.S. government further restricts AI chip and technology exports, dividing up the world to keep advanced computing power in the United States and among its allies. This creates what development economists call a digital colonialism scenario — entire regions become structurally dependent on technology controlled by a handful of corporations and governments. The concentration isn't limited to hardware. Three cloud providers — Amazon, Microsoft, and Google — control over 65% of global cloud infrastructure, creating additional bottlenecks for AI access. When you need specialized chips from one company, hosted on infrastructure controlled by three others, and governed by regulations written primarily in wealthy nations, the barriers to entry become virtually insurmountable for most of the world's population. This hardware concentration translates into stark global inequalities that dwarf previous technological divides. The economic and social benefits of AI remain geographically concentrated, primarily in the Global North. But unlike the gradual rollout of previous technologies like the internet or mobile phones, AI's infrastructure requirements create immediate exclusion rather than delayed adoption. Consider the practical reality: training a state-of-the-art AI model requires computational resources that cost millions of dollars and consume as much electricity as entire cities. The rise of AI could exacerbate both within-country and between-country inequality, placing upward pressure on global inequality as high-income individuals and regions benefit disproportionately while resource-poor regions risk being left behind. This creates a vicious cycle. Countries and regions without access to AI infrastructure become less competitive economically, reducing their ability to invest in the very infrastructure they need to participate in the AI economy. Meanwhile, AI-enabled automation threatens to disrupt traditional export industries that many developing economies rely on, from manufacturing to service outsourcing. The result is what economists call premature deindustrialization — developing countries losing industrial competitiveness before achieving full industrialization. But now it's happening at digital speed, compressed from decades into years. Yet maybe the fundamental challenge with AI isn't the technology itself — it's the intention behind its development and deployment, now amplified by a sharpened concentration of control. Today's AI systems are predominantly designed to maximize engagement, extract value, or optimize narrow business metrics determined by a small number of actors. Social media algorithms amplify divisive content because controversy drives clicks. Hiring algorithms perpetuate bias because they're trained on historical data that reflects past discrimination. Financial AI systems may optimize for short-term profits while creating systemic risks. This is where prosocial AI governance becomes essential. Unlike traditional regulatory approaches that focus on constraining harmful outcomes, prosocial AI governance aims to actively incentivize beneficial behaviors from the outset. ProSocial AI can enhance access to essential services, improve efficiency in resource use, and promote sustainable practices across all levels of society — but only if we design governance systems that prioritize broad-based benefits over narrow optimization. The global AI regulation landscape is fragmented and rapidly evolving. Earlier optimism that global policymakers would enhance cooperation and interoperability within the regulatory landscape now seems distant. The European Union has pioneered comprehensive AI regulation through its AI Act, while other jurisdictions take vastly different approaches — from the United States' innovation-first philosophy to China's state-directed development model. This fragmentation creates several problems. First, it allows AI developers to engage in regulatory arbitrage, developing systems in jurisdictions with the most permissive rules. Second, it prevents the emergence of global standards that could ensure AI systems operate prosocially across borders. Third, it creates competitive disadvantages for companies that voluntarily adopt higher ethical standards. Given the borderless nature of this issue, an internationally coordinated response is necessary. AI systems don't respect national boundaries — a biased hiring algorithm developed in one country can perpetuate discrimination globally, while misinformation generated by AI can destabilize societies worldwide. Traditional regulatory approaches tend to prove inadequate for rapidly evolving technologies. By the time regulators identify and respond to harms, the damage has already been done. Prosocial AI governance offers a different approach: building beneficial outcomes into the DNA of AI systems from the beginning. This means designing AI systems that actively promote human flourishing rather than merely avoiding harm. Instead of social media algorithms that maximize engagement at all costs, we need systems that promote constructive dialogue and community building. Rather than AI systems that automate away human jobs without consideration for displaced workers, we need technologies that augment human capabilities and create new opportunities for meaningful work. Companies with strong environmental, social, and governance frameworks, enhanced by AI, outperform competitors financially and foster greater brand loyalty. This suggests that prosocial AI isn't just morally imperative — it's also economically advantageous for businesses that adopt it early. Forward-thinking business leaders are beginning to recognize that prosocial AI governance isn't a constraint on innovation—it's a competitive advantage. Organizations that proactively embed prosocial values into their AI systems build stronger relationships with customers, employees, and communities. They reduce regulatory risk, attract top talent who want to work on meaningful problems, and position themselves as leaders in an increasingly values-driven marketplace. Moreover, prosocial AI often leads to better technical outcomes. Systems designed with diverse stakeholders in mind tend to be more robust, adaptable, and effective across different contexts. AI systems built with fairness and transparency as core requirements often discover innovative solutions that benefit everyone. The economic argument becomes even stronger when considering systemic risks. AI systems that prioritize narrow optimization over broader social welfare can create negative externalities that ultimately harm the very markets they operate in. Financial AI that ignores systemic risk can contribute to market crashes. Recommendation systems that polarize societies can undermine the social cohesion that stable markets depend on. Establishing global prosocial AI governance requires coordinated action across multiple levels. International bodies need to develop frameworks that incentivize prosocial AI development while allowing for innovation and adaptation to local contexts. These frameworks should focus on outcomes rather than specific technologies, creating space for diverse approaches while ensuring consistent prosocial objectives. At the organizational level, companies need to move beyond compliance-based approaches to AI ethics. This means embedding prosocial considerations into product development processes, establishing clear accountability mechanisms, and investing in the technical infrastructure needed to build genuinely beneficial AI systems. Technical standards organizations should develop metrics and evaluation frameworks that measure prosocial outcomes, not just traditional performance metrics. We need ways to assess whether AI systems actually promote human flourishing, environmental sustainability, and social cohesion. The urgency cannot be overstated. As AI systems become more powerful and pervasive, the window for establishing prosocial governance frameworks is rapidly closing. Once entrenched systems and business models become established, changing them becomes exponentially more difficult and expensive. We're at a pivotal moment where the next generation of AI systems will be designed and deployed. The decisions we make now about how to govern these systems will shape society for decades to come. We can either allow narrow economic interests to drive AI development, or we can proactively steer this technology toward broadly beneficial outcomes. The challenge of prosocial AI governance isn't someone else's problem — it's a defining challenge of our time that requires leadership from every sector of society. Business leaders, policymakers, technologists, civil society organizations and ultimately each of us have roles in the AI-infused play that society has become. Prosocial AI governance isn't a constraint on innovation — it's the foundation for sustainable technological progress that benefits everyone. The time to act is now, before today's AI solutions become tomorrow's entrenched problems.