Meta Looks to Expand AI Development by Investing in Scale AI
This story was originally published on Social Media Today. To receive daily news and insights, subscribe to our free daily Social Media Today newsletter.
Meta's reportedly looking to expand its AI infrastructure even further, by investing in AI startup Scale AI, which specializes in data labeling to facilitate AI model expansion.
According to Bloomberg, Meta is in advanced talks to potentially invest up to $10 billion into the company, which Meta already works with as part of its expanded AI development.
Originally founded in 2016, Scale AI has become a key provider of qualified data, which can help to improve AI system training significantly.
Scale AI ensures that data sources are accurately labeled and annotated, essentially making sense of data input streams in order to improve the training process. According to research, this can significantly reduce AI training time, by feeding more valuable data sources to AI companies, reducing manual load on their part.
And as noted, Scale AI already works with several major AI projects, including OpenAI and Meta, to assist in their current AI training process. Meta previously invested $1 billion in Scale's Series F funding round.
An expanded partnership could give Zuck and Co. a significant advantage, by ensuring that it has exclusive access to Scale's evolving data classification tools, which could help to improve Meta's AI models significantly.
This comes as Meta is also working to grow its data processing capacity, with the development of a 2 gigawatt data center, among various major infrastructure elements. Meta's also currently holds around 350,000 Nvidia H100 chips, which power its AI projects (both OpenAI and xAI have around 300k H100s), while it's also developing its own hardware for expanded AI development.
Essentially, Meta's pushing hard to become the leader in the global AI race. And with expanded investment in key infrastructure tools like Scale AI, it's well on the way to leading the market on potential AI development.
That could have major implications for Facebook, Instagram, and Meta's broader AI tools, including in VR environments and in wearables, moving forward.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
12 minutes ago
- Yahoo
Micron to invest $200 billion in US memory facilities
Memory chip maker Micron (MU) announced on Thursday that it will invest an additional $30 billion in the US, as it looks to build out its manufacturing and research and development facilities in Idaho and New York. The move brings Micron's total US manufacturing and R&D investments up to roughly $200 billion which will create some 90,000 direct and indirect jobs, the company said. Micron is receiving about $6.5 billion in funding from the US CHIPS Act. The plans call for Micron to build a second memory manufacturing plant at its Boise, Idaho, facility and a massive chip fabrication complex in New York. The company is also updating and expanding its Virginia plant. Micron also said it expects the second Idaho plant to help it bring its advanced high-bandwidth memory (HBM) manufacturing to the US. HBM is a key component in AI data centers. 'Micron's investment in advanced memory manufacturing and HBM capabilities in the U.S., with support from the Trump Administration, is an important step forward for the AI ecosystem,' Nvidia (NVDA) CEO Jensen Huang said in a statement. 'Micron's leadership in high-performance memory is invaluable to enabling the next generation of AI breakthroughs that NVIDIA is driving. We're excited to collaborate with Micron as we push the boundaries of what's possible in AI and high-performance computing,' Huang added. All totaled, Micron says the investments will allow the company to produce 40% of its DRAM memory in the US. Its initial Idaho plant is expected to begin pumping out the hardware in 2027. Micron also says it is set to begin preparing the ground for its New York facilities later this year. 'This approximately $200 billion investment will reinforce America's technological leadership, create tens of thousands of American jobs across the semiconductor ecosystem and secure a domestic supply of semiconductors—critical to economic and national security,' Micron CEO Sanjay Mehrotra said in a statement. 'We are grateful for the support from President Trump, Secretary Lutnick and our federal, state, and local partners who have been instrumental in advancing domestic semiconductor manufacturing.' Micron isn't the only company bringing HBM production to the US, though. South Korea's SK Hynix is also building a new HBM plant in Indiana as part of a $3.8 billion construction project. The Trump administration, and the Biden administration before it, has made onshoring semiconductor manufacturing a key component of its domestic agenda, as it seeks to wean itself off of the country's dependence on foreign-made chips. Companies ranging from Intel (INTC) and TSMC (TSM) to Samsung and GlobalFounderies (GFS) and others have recently announced plans to build or upgrade their facility throughout the country, thanks in part to billions of dollars in funding through the CHIPS Act. Email Daniel Howley at dhowley@ Follow him on X/Twitter at @DanielHowley. Error while retrieving data Sign in to access your portfolio Error while retrieving data Error while retrieving data Error while retrieving data Error while retrieving data


San Francisco Chronicle
15 minutes ago
- San Francisco Chronicle
AI chatbots need more books to learn from. These libraries are opening their stacks
CAMBRIDGE, Mass. (AP) — Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks. Nearly one million books published as early as the 15th century — and in 254 languages — are part of a Harvard University collection being released to AI researchers Thursday. Also coming soon are troves of old newspapers and government documents held by Boston's public library. Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artistsand others whose creative works have been scooped up without their consent to train AI chatbots. 'It is a prudent decision to start with public domain data because that's less controversial right now than content that's still under copyright,' said Burton Davis, a deputy general counsel at Microsoft. Davis said libraries also hold 'significant amounts of interesting cultural, historical and language data' that's missing from the past few decades of online commentary that AI chatbots have mostly learned from. Supported by 'unrestricted gifts' from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries around the world on how to make their historic collections AI-ready in a way that also benefits libraries and the communities they serve. 'We're trying to move some of the power from this current AI moment back to these institutions,' said Aristana Scourtas, who manages research at Harvard Law School's Library Innovation Lab. 'Librarians have always been the stewards of data and the stewards of information.' Harvard's newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter's handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians. It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems. 'A lot of the data that's been used in AI training has not come from original sources,' said the data initiative's executive director, Greg Leppert, who is also chief technologist at Harvard's Berkman Klein Center for Internet & Society. This book collection goes "all the way back to the physical copy that was scanned by the institutions that actually collected those items,' he said. Before ChatGPT sparked a commercial AI frenzy, most AI researchers didn't think much about the provenance of the passages of text they pulled from Wikipedia, from social media forums like Reddit and sometimes from deep repositories of pirated books. They just needed lots of what computer scientists call tokens — units of data, each of which can represent a piece of a word. Harvard's new AI training collection has an estimated 242 billion tokens, an amount that's hard for humans to fathom but it's still just a drop of what's being fed into the most advanced AI systems. Facebook parent company Meta, for instance, has said the latest version of its AI large language model was trained on more than 30 trillion tokens pulled from text, images and videos. Meta is also battling a lawsuit from comedian Sarah Silverman and other published authors who accuse the company of stealing their books from 'shadow libraries' of pirated works. Now, with some reservations, the real libraries are standing up. OpenAI, which is also fighting a string of copyright lawsuits, donated $50 million this year to a group of research institutions including Oxford University's 400-year-old Bodleian Library, which is digitizing rare texts and using AI to help transcribe them. When the company first reached out to the Boston Public Library, one of the biggest in the U.S., the library made clear that any information it digitized would be for everyone, said Jessica Chapel, its chief of digital and online services. 'OpenAI had this interest in massive amounts of training data. We have an interest in massive amounts of digital objects. So this is kind of just a case that things are aligning,' Chapel said. Digitization is expensive. It's been painstaking work, for instance, for Boston's library to scan and curate dozens of New England's French-language newspapers that were widely read in the late 19th and early 20th century by Canadian immigrant communities from Quebec. Now that such text is of use as training data, it helps bankroll projects that librarians want to do anyway. 'We've been very clear that, 'Hey, we're a public library,'" Chapel said. 'Our collections are held for public use, and anything we digitized as part of this project will be made public.' Harvard's collection was already digitized starting in 2006 for another tech giant, Google, in its controversial project to create a searchable online library of more than 20 million books. Google spent years beating back legal challenges from authors to its online book library, which included many newer and copyrighted works. It was finally settled in 2016 when the U.S. Supreme Court let stand lower court rulings that rejected copyright infringement claims. Now, for the first time, Google has worked with Harvard to retrieve public domain volumes from Google Books and clear the way for their release to AI developers. Copyright protections in the U.S. typically last for 95 years, and longer for sound recordings. How useful all of this will be for the next generation of AI tools remains to be seen as the data gets shared Thursday on the Hugging Face platform, which hosts datasets and open-source AI models that anyone can download. The book collection is more linguistically diverse than typical AI data sources. Fewer than half the volumes are in English, though European languages still dominate, particularly German, French, Italian, Spanish and Latin. A book collection steeped in 19th century thought could also be 'immensely critical' for the tech industry's efforts to build AI agents that can plan and reason as well as humans, Leppert said. 'At a university, you have a lot of pedagogy around what it means to reason,' Leppert said. 'You have a lot of scientific information about how to run processes and how to run analyses.' At the same time, there's also plenty of outdated data, from debunked scientific and medical theories to racist narratives. 'When you're dealing with such a large data set, there are some tricky issues around harmful content and language," said Kristi Mukk, a coordinator at Harvard's Library Innovation Lab who said the initiative is trying to provide guidance about mitigating the risks of using the data, to 'help them make their own informed decisions and use AI responsibly.'


Bloomberg
16 minutes ago
- Bloomberg
Arm CEO Sides With Nvidia Against US Export Limits on China
Arm Holdings Plc Chief Executive Officer Rene Haas said Thursday that US export controls on China threaten to slow overall technological advances and are ultimately bad for consumers and companies, aligning himself with Nvidia Corp. Chief Executive Officer Jensen Huang and others looking to ease tensions between Washington and Beijing. 'If you narrow access to to technology and you force other ecosystems to grow up, it's not good,' Haas said Thursday in an interview with Bloomberg at the Founders Forum Global conference in Oxford. 'It makes the pie smaller, if you will. And frankly, it's not very good for consumers.' He also noted that Arm's footprint in China is 'quite significant.'