3 Breakthrough Ways Data Is Powering The AI Reasoning Revolution

Forbes4 days ago

Olga Megorskaya is Founder & CEO of Toloka AI, a high quality data partner for all stages of AI development.
The buzz around reasoning models like DeepSeek R1, OpenAI o1 and Grok 3 signals a turning point in AI development that pivots on reasoning.
When we talk about reasoning, we mean that models can do more than repeat patterns—they think through problems step by step, consider multiple perspectives before giving a final answer and double-check their work. As reasoning skills improve, modern LLMs are pushing us closer to a future where AI agents can autonomously handle all sorts of tasks.
AI agents will become useful enough for widespread use when they learn to truly reason, meaning they adapt to new challenges, generalize skills from one area to apply them in a new domain, navigate multiple environments and reliably produce correct answers and outputs. Behind these emerging skills, you'll find sophisticated datasets used for training and evaluating the models. The better the data, the stronger the reasoning skills.
How is data shaping the next generation of reasoning models and agents? As a data partner to frontier labs, we've identified three ways that data drives AI reasoning right now: domain diversity and complexity, refined reasoning and robust evaluations.
By building stronger reasoning skills in AI systems, these new approaches to data for training and testing will open a door to the widespread adoption of AI agents.
Current models often train well in structured environments like math and coding, where answer verification is straightforward, fitting nicely into classical reinforcement learning frameworks. But the next leap requires pushing into more complex data across a wider knowledge spectrum. This is to achieve better generalization and performance as models transfer learning across areas.
Beyond math and coding, here's the kind of data becoming essential for training the next wave of AI:
These data points cover multi-step scenarios like web research trajectories with verification checkpoints.
This includes open-ended domains such as law or business consulting that have multifaceted answers, which makes them difficult to verify but important for advanced reasoning. Think of complex legal issues with multiple valid approaches or comprehensive market assessments with validation criteria.
Agent datasets are based on taxonomies of use cases, domains and categories as well as real-world tasks. For instance, a task for a corporate assistant agent would be to respond to a support request using simulated knowledge bases and company policies.
Agents also need contexts and environments that simulate how they interact with specific software, data in a CRM or knowledge base or other infrastructure. These contexts are created manually for agent training and testing.
The path a model takes to an answer is becoming as critical as the answer itself. As classical model training approaches are revisited, techniques like reward shaping (providing intermediate guidance) are vital. Current methods focus on guiding the process with feedback from human experts for better coherence, efficiency and safety:
This focuses on a model's "thinking" rather than the outcome by guiding it through logical reasoning steps or guiding an agent through interactions with the environment. Think of it like checking step-by-step proofs in math, where human experts review each step and identify where a model makes a mistake instead of evaluating the final answer.
Preference-based learning trains models to prioritize better reasoning paths. Experts review alternative paths and choose the best ones for models to learn from. This data can compare entire trajectories or individual steps in a process.
These include data crafted from scratch to show high-quality reasoning sequences, much like teaching by example. Another approach is to edit LLM reasoning steps to improve them and let the model learn from the corrections.
Current LLM evaluations have two main limitations: They struggle to provide meaningful signals of substantial improvements, and they are slow to adapt. The challenges mirror those in training data, including limited coverage of niche domains and specialized skills.
To drive real progress, benchmarks need to specifically address the quality and safety of reasoning models and agents. Based on our own efforts, here's how to collaborate with clients on evaluations:
Include a wider range of domains, specialized skill sets and more complex, real-world tasks. Move beyond single-metric evaluations to assess interdisciplinary and long-term challenges like forecasting.
Use fine-grained, use-case-specific metrics. Co-develop these with subject-matter experts to add depth and capture nuances that standard benchmarks miss.
As models develop advanced reasoning, safety evaluations must track the full chain of thought. For agents interacting with external tools or APIs, red teaming becomes critical. We recommend developing structured testing environments for red teamers and using the outcomes to generate new datasets focused on identified vulnerabilities.
Even as model architectures advance, data remains the bedrock. In the era of reasoning models and agents, the emphasis has shifted decisively toward data quality, diversity and complexity.
New approaches to data production are having a tremendous impact on the pace of AI development, urging reasoning models forward faster. With data providers upping their game to support the reasoning paradigm, we expect the near future to bring a wave of domain-specific, task-optimized reasoning agents—a new era of agentic AI.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Here's how Uber's product chief uses AI at work — and one tool he's going to use next

Yahoo

20 minutes ago

Yahoo

Here's how Uber's product chief uses AI at work — and one tool he's going to use next

Uber's product chief said he uses AI to summarize lengthy reports and do research before launches. Sachin Kansal uses ChatGPT and Gemini to understand Uber's performance in overseas markets. He plans to add Google's NotebookLM to his AI suite. Uber's chief product officer has one AI tool on his to-do list. In an episode of "Lenny's Podcast" released on Sunday, Uber's product chief, Sachin Kansal, shared two ways he is using AI for his everyday tasks at the ride-hailing giant and how he plans to add NotebookLM to his AI suite. Kansal joined Uber eight years ago as its director of product management after working at cybersecurity and taxi startups. He became Uber's product chief last year. Kansal said he uses OpenAI's ChatGPT and Google's Gemini to summarize long reports. "Some of these reports, they're 50 to 100 pages long," he said. "I will never have the time to read them." He said he uses the chatbots to acquaint himself with what's happening and how riders are feeling in Uber's various markets, such as South Africa, Brazil, and Korea. The CPO said his second use case is treating AI like a research assistant, because some large language models now offer a deep research feature. Kansal gave a recent example of when his team was thinking about a new driver feature. He asked ChatGPT's deep research mode about what drivers may think of the add-on. "It's an amazing research assistant and it's absolutely a starting point for a brainstorm with my team with some really, really good ideas," the CPO said. In April, Uber's CEO, Dara Khosrowshahi, said that not enough of his 30,000-odd employees are using AI. He said learning to work with AI agents to code is "going to be an absolute necessity at Uber within a year." Uber did not immediately respond to a request for comment from Business Insider. On the podcast, Kansal also highlighted NotebookLM, Google Lab's research and note-taking tool, which is especially helpful for interacting with documents. He said he doesn't use the product yet, but wants to. "I know a lot of people who have started using it, and that is the next thing that I'm going to use," he said. "Just to be able to build an audio podcast based on a bunch of information that you can consume. I think that's awesome," he added. Kansal was referring to the "Audio Overview" feature, which summarizes uploaded content in the form of two AIs having a voice discussion. NotebookLM was launched in mid-2023 and has quickly become a must-have tool for researchers and AI enthusiasts. Andrej Karpathy, Tesla's former director of AI and OpenAI cofounder, is among those who have praised the tool and its podcast feature. "It's possible that NotebookLM podcast episode generation is touching on a whole new territory of highly compelling LLM product formats," he said in a September post on X. "Feels reminiscent of ChatGPT. Maybe I'm overreacting." Read the original article on Business Insider Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

YC partners say AI founders are closing huge deals fast by taking a page out of Palantir's early playbook

Yahoo

20 minutes ago

Yahoo

YC partners say AI founders are closing huge deals fast by taking a page out of Palantir's early playbook

AI founders should see themselves as "forward-deployed engineers," said Y Combinator partners. The term, popularized by Palantir, refers to engineers who embed themselves with clients. Founders have closed "six, seven seven-figure deals" by being forward-deployed engineers, said a YC partner. Some AI founders are landing big enterprise deals by doing something old-school: showing up, writing code, and building the perfect demo — fast. YC partners say this strategy is taking off, and it's straight out of Palantir's early playbook. Startup founders should see themselves as "forward-deployed engineers," said Garry Tan, YC's CEO, on an episode of the "Y Combinator" podcast published Friday. The term, popularized by Palantir, refers to engineers who embed themselves with clients to fine-tune the product on-site. Tan, who was Palantir's 10th employee, said the defense tech company's edge came from recognizing that many government agencies and Fortune 500 companies lacked deep technical expertise in the room. Palantir bridged that gap by embedding technically savvy engineers during sales and implementation. Much of Palantir's success comes from its business with the US government. The Department of Defense is its biggest customer, making up 41% of its fourth-quarter revenue. Startup founders need to be "technical," "great product people," and even "ethnographers" and "designers," said Tan, who worked at Palantir from 2005 to 2007." "You want the person on the second meeting to see the demo you put together based on the stuff you heard, and you want them to say, 'Wow, I've never seen anything like that.' And take my money," he added. This hands-on approach is already delivering big results. YC partner Diana Hu said she and her team have seen founders close "six, seven seven-figure deals" with large enterprises by being forward-deployed engineers. Sometimes, she said, a pair of founders wins a deal by walking into a boardroom, gathering context, and coming back the next day with a tailored AI demo. Once the deal is closed, some of these founders go on-site to work closely with customer support teams, continuously fine-tuning the software or language model to improve performance, said YC partner Harj Taggar. Tan said this model gives AI startups a chance to outmaneuver giants like Salesforce, Oracle, and Booz Allen. "You have big fancy salespeople with big strong handshakes, and it's like, how does a really good engineer with a weak handshake go in there and beat them?" Tan said. "It's actually you show them something that they've never seen before, and like, make them feel super heard." Y Combinator did not respond to a request for comment from Business Insider. Read the original article on Business Insider Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Microsoft's Bad News—500 Million Windows Users Must Now Decide

Forbes

28 minutes ago

Forbes

Microsoft's Bad News—500 Million Windows Users Must Now Decide

Surprising bad news suddenly hits Microsoft. A new warning has been issued for Windows users, whose PCs have been described as 'magnets for security threats,' just as new data gives Microsoft a surprising bad news story ahead of the critical next few months. You can expect many more such warnings as 500-million Windows users face an increasingly urgent decision. The latest advice comes courtesy of PC maker Asus, pointing out that 'if you're still using Windows 10 or, dare we say it, something even older — your computer's days of regular updates and support are numbered.' As for upgrades, 'what makes Windows 11 different?," Asus says. "one word: Copilot," as it pushes the latest range of AI PCs. Clearly, you don't need to decide on a premium Copilot PC to benefit from Windows 11's future-proofing, ensuring your PC receives critical security updates after Windows 10's demise in October. AI PCs remain a niche, despite projections they will eventually dominate new PC sales. Right now, there's a more fundamental decision to make. Windows 10 versus Windows 11 globally. The latest Windows market data presents a painfully bleak picture with just over five months to run until free Windows 10 security updates end for all users. Paid extensions are available, but they're expensive for enterprises and restricted to just 12-months for home users who also must pay. Microsoft is pushing free upgrades not paid extensions. A month ago, it seemed Windows 11 had turned the tide against Windows 10. The newer OS already outanks its older sibling in the U.S. but not globally. Come the end of April, though, Windows 11 was within 10% of Windows 11 for the first time. 'Just over half (53%) of all users are still on Windows 10, but that's inching down month by month.' Not any more, it seems. While more directional than exact, Statcounter's data at the end of May shows a slight month-over-month increase for Windows 10, while Windows 11 dips. This after four months of steady progress the other way. Windows 10 is holding stubbornly above 50% while Windows 11 remains 10% behind. Windows 10 versus Windows 11 in U.S. This means there are around 750 million users are yet to upgrade to Windows 11, of which at least 240 million don't have an eligible PC. That still leaves around 500 million users who can take up Microsoft's offer for a free Windows 11 upgrade but have not. Even in the U.S., where Windows 11 has overtaken Windows 10, May's data suggests Windows 10 has grown its share from 41% in April to more than 43%, while Windows 11 drops a more worrying 3.5%, from 56.5% down to below 53%. All this makes June's data critical. Come the end of this month, there will be just three months until Windows 10 is shuttered. If Microsoft is to avoid a cybersecurity nightmare hitting mid-October, something need to change. For all those Windows 10 users with PCs eligible for a free upgrade, do not run out of time.