Latest news with #SocialAndLanguageTechnologiesLab

Geek Wire

28-07-2025

Business
Geek Wire

In AI we trust?

A recent study by Stanford University's Social and Language Technologies Lab (SALT) found that 45% of workers don't trust the accuracy, capability, or reliability of AI systems. That trust gap reflects a deeper concern about how AI behaves when the stakes are high, especially in business-critical environments. Hallucinations in AI may be acceptable when the stakes are low, like drafting a tweet or generating creative ideas, where errors are easily caught and carry little consequence. But in the enterprise, where AI agents are expected to support high-stakes decisions, power workflows, and engage directly with customers, the tolerance for error disappears. True enterprise-grade reliability demands more: consistency, predictability, and rigorous alignment with real-world context, because even small mistakes can have big consequences. This challenge is referred to as 'jagged intelligence,' where AI systems continue to shatter performance records on increasingly complex benchmarks, while sporadically struggling with simpler tasks that most humans find intuitive and can reliably solve. For example, a model might be able to defeat a chess grandmaster that is unable to complete a simple child's puzzle. This mismatch between brilliance and brittleness underscores why enterprise AI demands more than general LLM intelligence alone; it requires contextual grounding, rigorous testing, and continuous fine-tuning. That's why at Salesforce, we believe the future of AI in business depends on achieving what we call Enterprise General Intelligence (EGI) – a new framework for enterprise-grade AI systems that are not only highly capable but also consistently reliable across complex, real-world scenarios. In an EGI environment, AI agents work alongside humans, integrated into enterprise systems and governed by strict rules that limit what actions they can take. To achieve this, we're implementing a clear, repeatable three-step framework – synthesize, measure, and train – and applying this to every enterprise-grade use case. A Three-Step Framework for Building Trust Building AI agents within the enterprise demands a disciplined process that grounds models in business-contextualized data, measures performance against real-world benchmarks, and continuously fine-tunes agents to maintain accuracy, consistency, and safety. Synthesize: Building trustworthy agents starts with safe, realistic testing environments. That means using AI-generated synthetic data that closely resembles real inputs, applying the same business logic and objectives used in human workflows, and running agents in secure, isolated sandboxes. By simulating real-world conditions without exposing production systems or sensitive data, teams can generate high-fidelity feedback. This method is called 'reinforcement learning' and is a critical foundation for developing enterprise-ready AI agents. Building trustworthy agents starts with safe, realistic testing environments. That means using AI-generated synthetic data that closely resembles real inputs, applying the same business logic and objectives used in human workflows, and running agents in secure, isolated sandboxes. By simulating real-world conditions without exposing production systems or sensitive data, teams can generate high-fidelity feedback. This method is called 'reinforcement learning' and is a critical foundation for developing enterprise-ready AI agents. Measure: Reliable agents require clear, consistent benchmarks. Measuring performance isn't just about tracking accuracy, it's about defining what each specific use case requires. The level of precision needed varies: An agent offering product recommendations may tolerate a wider margin of error than one evaluating loan applications or diagnosing system failures. By establishing tailored benchmarks such as Salesforce's initial LLM benchmark for CRM use cases, and acceptable performance thresholds, teams can evaluate agent output in context and iterate with purpose, ensuring the agent is fit for its intended role before it ever reaches production. LLM benchmark Train: Reliability isn't achieved in a single pass — it's the result of continuous refinement. Agents must be trained, tested, and retrained in a constant feedback loop. That means generating fresh data, running real-world scenarios, measuring outcomes, and using those insights to improve performance. Because agent behavior can vary across runs, this iterative process is essential for building stability over time. Only through repeated training and tuning can agents reach the level of consistency and accuracy required for enterprise use. Turning AI Agents Into Reliable Enterprise Partners Building AI agents for the enterprise is much more than simply deploying an LLM for business-critical tasks. Salesforce AI Research's latest research shows that generic LLM agents successfully complete only 58% of simple tasks and barely more than a third of more complex ones. Truly effective EGI agents that are trustworthy in high-stakes business scenarios require far more than an off-the-shelf DIY LLM plug-in. They demand a rigorous, platform-driven approach that grounds models in business-specific context, enforces governance, and continuously measures and fine-tunes performance. The AI we deploy in Agentforce is built differently. Agentforce doesn't run by simply plugging into an LLM. The agents are grounded in business-specific context through Data Cloud, made trustworthy by our enterprise-grade Trust Layer, and designed for reliability through continuous evaluation and optimization using the Testing Center. This platform-driven approach ensures that agents are not only intelligent, but consistently enterprise-ready. As businesses evolve toward a future where specialized AI agents collaborate dynamically in teams, ‌complexity increases exponentially. That's why leveraging frameworks that synthesize, evaluate, and train agents before deployment is critical. This new framework builds the trust needed to elevate AI from a promising technology into a reliable enterprise partner that drives meaningful business outcomes.

Latest news with #SocialAndLanguageTechnologiesLab

In AI we trust?

Get Started Now: Download the App