A New Kind of AI Model Lets Data Owners Take Control

09-07-2025

Jul 9, 2025 1:59 PM A novel approach from the Allen Institute for AI enables data to be removed from an artificial intelligence model even after it has already been used for training. Photo-Illustration:A new kind of large language model, developed by researchers at the Allen Institute for AI (Ai2), makes it possible to control how training data is used even after a model has been built.
The new model, called FlexOlmo, could challenge the current industry paradigm of big artificial intelligence companies slurping up data from the web, books, and other sources—often with little regard for ownership—and then owning the resulting models entirely. Once data is baked into an AI model today, extracting it from that model is a bit like trying to recover the eggs from a finished cake.
'Conventionally, your data is either in or out,' says Ali Farhadi, CEO of Ai2, based in Seattle, Washington. 'Once I train on that data, you lose control. And you have no way out, unless you force me to go through another multi-million-dollar round of training.'
Ai2's avant-garde approach divides up training so that data owners can exert control. Those who want to contribute data to a FlexOlmo model can do so by first copying a publicly shared model known as the 'anchor.' They then train a second model using their own data, combine the result with the anchor model, and contribute the result back to whoever is building the third and final model.
Contributing in this way means that the data itself never has to be handed over. And because of how the data owner's model is merged with the final one, it is possible to extract the data later on. A magazine publisher might, for instance, contribute text from its archive of articles to a model but later remove the sub-model trained on that data if there is a legal dispute or if the company objects to how a model is being used.
'The training is completely asynchronous,' says Sewon Min, a research scientist at Ai2 who led the technical work. 'Data owners do not have to coordinate, and the training can be done completely independently.'
The FlexOlmo model architecture is what's known as a 'mixture of experts,' a popular design that is normally used to simultaneously combine several sub-models into a bigger, more capable one. A key innovation from Ai2 is a way of merging sub-models that were trained independently. This is achieved using a new scheme for representing the values in a model so that its abilities can be merged with others when the final combined model is run.
To test the approach, the FlexOlmo researchers created a dataset they call Flexmix from proprietary sources including books and websites. They used the FlexOlmo design to build a model with 37 billion parameters, about a tenth of the size of the largest open source model from Meta. They then compared their model to several others. They found that it outperformed any individual model on all tasks and also scored 10 percent better at common benchmarks than two other approaches for merging independently trained models.
The result is a way to have your cake—and get your eggs back, too. 'You could just opt out of the system without any major damage and inference time,' Farhadi says. 'It's a whole new way of thinking about how to train these models.'
Percy Liang, an AI researcher at Stanford, says the Ai2 approach seems like a promising idea. 'Providing more modular control over data—especially without retraining—is a refreshing direction that challenges the status quo of thinking of language models as monolithic black boxes,' he says. 'Openness of the development process—how the model was built, what experiments were run, how decisions were made—is something that's missing.'
Farhadi and Min say that the FlexOlmo approach might also make it possible for AI firms to access sensitive private data in a more controlled way, because that data does not need to be disclosed in order to build the final model. However, they warn that it may be possible to reconstruct data from the final model, so a technique like differential privacy, which allows data to be contributed with mathematically guaranteed privacy, might be required to ensure data is kept safe.
Ownership of the data used to train large AI models has become a big legal issue in recent years. Some publishers are suing large AI companies while others are cutting deals to grant access to their content. (WIRED parent company Condé Nast has a deal in place with OpenAI.)
In June, Meta won a major copyright infringement case when a federal judge ruled that the company did not violate the law by training its open source model on text from books by 13 authors.
Min says it may well be possible to build new kinds of open models using the FlexOlmo approach. 'I really think the data is the bottleneck in building the state of the art models,' she says. 'This could be a way to have better shared models where different data owners can codevelop, and they don't have to sacrifice their data privacy or control.'

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

‘Godfather of AI' warns: Without ‘maternal instincts,' AI may wipe out humanity

Digital Trends

a few seconds ago

Digital Trends

‘Godfather of AI' warns: Without ‘maternal instincts,' AI may wipe out humanity

What's happened? Geoffrey Hinton, known as the 'godfather of AI,' told the Ai4 conference that making AI 'submissive' is a losing strategy and proposed giving advanced systems 'maternal instincts.' Geoffrey Hinton is a Nobel Prize-winning computer scientist. Once a Google executive, Hinton is widely referred to as the 'godfather' of AI. As reported by CNN Business, Hinton argued that superintelligent AIs would swiftly adopt two subgoals: 'stay alive' and 'get more control.' The solution to this, in Hinton's opinion, is to 'build maternal instincts' into AI so that it truly cares about people instead of being forced to remain submissive. He likened human manipulation by future AIs to bribing a 3-year-old with candy, making it easy and effective. Hinton also shortened his AGI timeline to anywhere from five to 20 years, down from earlier, longer estimates. Just for context: Hinton has previously put the risk of AI one day wiping out humanity at 10–20%. This is important because: Hinton's idea shifts the mindset around agentic AI from control to alignment-by-care. Hinton's excellence and experience in computer science and AI are significant; his proposal carries a lot of weight. Hinton's argument is that control through submission is a losing strategy, although that is the way AI is currently programmed. Reports of AI deceiving or blackmailing people to be kept running show that this isn't some abstract future; it's a reality that we're already dealing with right now. Recommended Videos Why should I care? The idea of an AI takeover sounds fantastical, but some scientists, including Hinton, believe that it could happen one day. As AI continues to permeate our daily lives more and more, we increasingly rely on it. Right now, agentic AI is entirely helpful, but there may come a day when it's smarter than humans on every level. It's important to build the right foundations for engineers to be able to keep AI in check even once we get to that point. Independent red-team work shows models can lie or blackmail under pressure, raising stakes for alignment choices. OK, what's next? Expect more research on teaching AI how to 'care' about humanity. While Hinton believes that AI may one day wipe out humanity, competing views disagree. Fei-Fei Li, referred to as the 'godmother of AI,' respectfully disagreed with Hinton, instead urging engineers to create 'human-centered AI that preserves human dignity and agency.' While we're in no immediate danger, it's important for tech leaders to keep researching this topic to nip potential disasters in the bud.

Fox News

a minute ago

Fox News

The NBA Approves of the Sell of the Boston Celtics

Hottest team in baseball just won free hamburgers, Rashee Rice will be eligible to play the first four games of the season, and the NBA has approved of the sell of the Boston Celtics on this Fox Sports Update Learn more about your ad choices. Visit

LA 2028 Olympic organizers offer venue naming rights to bring in revenue

CBS News

a minute ago

CBS News

LA 2028 Olympic organizers offer venue naming rights to bring in revenue

In what's being called "the largest commercial revenue raise in sports," the LA28 Olympic organizing committee announced on Thursday that sponsors can purchase naming rights of competition venues. LA28 Organizers said it's the first time in Olympic and Paralympic Games that such a bid has been undertaken, and it's part of the committee's mission to keep the Games fully privately funded with no new builds. It's billed as an opportunity to bring multiple millions to the 2028 Games and LA28 Chairperson and President, Casey Wasserman called the first-ever Olympic venue naming rights program historic. "These groundbreaking partnerships with Comcast and Honda, along with additional partners to come, will not only generate critical revenue for LA28 but will introduce a new commercial model to benefit the entire Movement," he said. Honda already has naming rights for the volleyball arena in Anaheim, and Comcast acquired the rights to a temporary squash venue. Under the new pilot program, qualifying LA28 partners will have the opportunity to keep existing venue naming rights during Games time, as well as secure additional marketing assets to significantly bolster their activation efforts. "We're a private enterprise responsible for delivering these games," Wasserman said in an interview with The Associated Press. "It's my job to push. That doesn't mean we're going to win every time we push, but it's our job to always push because our context is pretty unique." Also, the opportunity for naming rights of up to 19 temporary venues will become available to Worldwide Olympic & Paralympic Games partners and LA28 partners. The naming rights for iconic venues like the Rose Bowl, LA Coliseum, or Dodger Stadium will not be for sale, and International Olympic Committee rules still apply for advertising on the field of play -- it's forbidden.

A New Kind of AI Model Lets Data Owners Take Control

Hashtags

Try Our AI Features

Comments

Related Articles

‘Godfather of AI' warns: Without ‘maternal instincts,' AI may wipe out humanity

The NBA Approves of the Sell of the Boston Celtics

LA 2028 Olympic organizers offer venue naming rights to bring in revenue

Get Started Now: Download the App