logo
AI chatbots need more books to learn from. These libraries are opening their stacks

AI chatbots need more books to learn from. These libraries are opening their stacks

Independent12-06-2025
Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks.
Nearly one million books published as early as the 15th century — and in 254 languages — are part of a Harvard University collection being released to AI researchers Thursday. Also coming soon are troves of old newspapers and government documents held by Boston's public library.
Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artistsand others whose creative works have been scooped up without their consent to train AI chatbots.
'It is a prudent decision to start with public domain data because that's less controversial right now than content that's still under copyright,' said Burton Davis, a deputy general counsel at Microsoft.
Davis said libraries also hold 'significant amounts of interesting cultural, historical and language data' that's missing from the past few decades of online commentary that AI chatbots have mostly learned from.
Supported by 'unrestricted gifts' from Microsoft and ChatGPT maker OpenAI, the Harvard-based Institutional Data Initiative is working with libraries around the world on how to make their historic collections AI-ready in a way that also benefits libraries and the communities they serve.
'We're trying to move some of the power from this current AI moment back to these institutions,' said Aristana Scourtas, who manages research at Harvard Law School's Library Innovation Lab. 'Librarians have always been the stewards of data and the stewards of information.'
Harvard's newly released dataset, Institutional Books 1.0, contains more than 394 million scanned pages of paper. One of the earlier works is from the 1400s — a Korean painter's handwritten thoughts about cultivating flowers and trees. The largest concentration of works is from the 19th century, on subjects such as literature, philosophy, law and agriculture, all of it meticulously preserved and organized by generations of librarians.
It promises to be a boon for AI developers trying to improve the accuracy and reliability of their systems.
'A lot of the data that's been used in AI training has not come from original sources,' said the data initiative's executive director, Greg Leppert, who is also chief technologist at Harvard's Berkman Klein Center for Internet & Society. This book collection goes "all the way back to the physical copy that was scanned by the institutions that actually collected those items,' he said.
Before ChatGPT sparked a commercial AI frenzy, most AI researchers didn't think much about the provenance of the passages of text they pulled from Wikipedia, from social media forums like Reddit and sometimes from deep repositories of pirated books. They just needed lots of what computer scientists call tokens — units of data, each of which can represent a piece of a word.
Harvard's new AI training collection has an estimated 242 billion tokens, an amount that's hard for humans to fathom but it's still just a drop of what's being fed into the most advanced AI systems. Facebook parent company Meta, for instance, has said the latest version of its AI large language model was trained on more than 30 trillion tokens pulled from text, images and videos.
Meta is also battling a lawsuit from comedian Sarah Silverman and other published authors who accuse the company of stealing their books from 'shadow libraries' of pirated works.
Now, with some reservations, the real libraries are standing up.
OpenAI, which is also fighting a string of copyright lawsuits, donated $50 million this year to a group of research institutions including Oxford University 's 400-year-old Bodleian Library, which is digitizing rare texts and using AI to help transcribe them.
When the company first reached out to the Boston Public Library, one of the biggest in the U.S., the library made clear that any information it digitized would be for everyone, said Jessica Chapel, its chief of digital and online services.
'OpenAI had this interest in massive amounts of training data. We have an interest in massive amounts of digital objects. So this is kind of just a case that things are aligning,' Chapel said.
Digitization is expensive. It's been painstaking work, for instance, for Boston's library to scan and curate dozens of New England's French-language newspapers that were widely read in the late 19th and early 20th century by Canadian immigrant communities from Quebec. Now that such text is of use as training data, it helps bankroll projects that librarians want to do anyway.
'We've been very clear that, 'Hey, we're a public library,'" Chapel said. 'Our collections are held for public use, and anything we digitized as part of this project will be made public.'
Harvard's collection was already digitized starting in 2006 for another tech giant, Google, in its controversial project to create a searchable online library of more than 20 million books.
Google spent years beating back legal challenges from authors to its online book library, which included many newer and copyrighted works. It was finally settled in 2016 when the U.S. Supreme Court let stand lower court rulings that rejected copyright infringement claims.
Now, for the first time, Google has worked with Harvard to retrieve public domain volumes from Google Books and clear the way for their release to AI developers. Copyright protections in the U.S. typically last for 95 years, and longer for sound recordings.
How useful all of this will be for the next generation of AI tools remains to be seen as the data gets shared Thursday on the Hugging Face platform, which hosts datasets and open-source AI models that anyone can download.
The book collection is more linguistically diverse than typical AI data sources. Fewer than half the volumes are in English, though European languages still dominate, particularly German, French, Italian, Spanish and Latin.
A book collection steeped in 19th century thought could also be 'immensely critical' for the tech industry's efforts to build AI agents that can plan and reason as well as humans, Leppert said.
'At a university, you have a lot of pedagogy around what it means to reason,' Leppert said. 'You have a lot of scientific information about how to run processes and how to run analyses.'
At the same time, there's also plenty of outdated data, from debunked scientific and medical theories to racist narratives.
'When you're dealing with such a large data set, there are some tricky issues around harmful content and language," said Kristi Mukk, a coordinator at Harvard's Library Innovation Lab who said the initiative is trying to provide guidance about mitigating the risks of using the data, to 'help them make their own informed decisions and use AI responsibly.'
————
The Associated Press and OpenAI have a licensing and technology agreement that allows OpenAI access to part of AP's text archives.
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Small waterfront community dethrones Beverly Hills as America's richest ZIP code
Small waterfront community dethrones Beverly Hills as America's richest ZIP code

Daily Mail​

time28 minutes ago

  • Daily Mail​

Small waterfront community dethrones Beverly Hills as America's richest ZIP code

A gated Florida community has bumped Beverly Hills as America's priciest neighborhood. Gables Estates, a celebrity-filled gated manor on the south side of Miami, ranked number one in Zillow's monthly home value tracker. The analysis, which combs home tax assessments and real estate transactions, found that the palm tree-lined neighborhood with massive Spanish colonial homes was America's most expensive neighborhood. Currently, it has five homes listed for sale. They're priced from $17.5 million to $54.9 million. There are 179 homes, averaging over 55,000 square feet - or nearly the size of a football field - in the neighborhood. Almost every home is considered beachfront property with private access to Miami's warm ocean waters. Dozens of homes come equipped with ports ready for mega yachts, gargantuan pools, and obsessively manicured lawns. And, after paying the eye-watering cost of the home, owners are expected to pay upwards of $250,000 annually in homeowners' association fees. Meanwhile, Beverly Hills, long considered the zenith of celebrity wealth, fell to number four on the list of high-priced neighborhoods. Home prices have dropped in the star-studded California neighborhood. After the median Beverly Hills home listing price peaked at around $9.5 million in February 2023, sellers have slashed expectations to $6.6 million. Beverly Hills' decline and Gables Estates' rise come as some of America's wealthiest individuals head for tax havens. Florida does not collect income tax. Dozens of executives - including Amazon's Jeff Bezos, Citadel's Ken Griffin, and Palantir's Peter Thiel - have dashed to the Sunshine State. Multiple tech companies are also snapping up office space in Coral Gables, the neighboring city. Apple recently inked a deal to lease office space in the town, while Amazon is currently working to move part of its headquarters to neighboring cities. But even before the tech-funded migration, Gables Estates was home to dozens of A-list celebrities. Richard Fain, Royal Caribbean Cruises' former CEO, has owned a home in the neighborhood since 1989 Meanwhile, the average listing price for a Beverly Hills home has dropped $2.9 million in the past two years, according to Singer Gloria Estefan owns a home in the neighborhood. Former baseball star Alex Rodriguez called it home, too. Richard Fain, Royal Caribbean Cruises' former CEO, also owns property. Chart-topping musicians - including Jay-Z, Pharrell Williams, Shakira, and Jennifer Lopez - have been photographed partying and beaching at the luxe enclave. Gables Estates' rise on the charts is part of a growing trend for Florida. The state is home to seven of the ten most expensive zip codes in the US, the study found. California houses the other three highest-value communities. No other states crack the top ten. Port Royal in Naples, Florida, grabbed the second spot. Old Cutler Bay, located just down the street from Gable Estates, took bronze. The Flats, a section of Beverly Hills, claimed the fifth-place slot.

X+ Rival : A Steam Machine, Console Killer in the Palm of Your Hand
X+ Rival : A Steam Machine, Console Killer in the Palm of Your Hand

Geeky Gadgets

time29 minutes ago

  • Geeky Gadgets

X+ Rival : A Steam Machine, Console Killer in the Palm of Your Hand

What if your gaming setup could fit in the palm of your hand without sacrificing power or versatility? The idea of a high-performance mini gaming PC transforming into a console-like Steam Machine might sound too good to be true, but it's not. Enter the X+ Rival Mini Gaming PC, a device that redefines what compact systems can achieve. With its AMD Ryzen AI Max Plus 395 APU and dual-boot functionality, this little powerhouse offers a unique blend of portability and performance, making it an intriguing option for gamers and tech enthusiasts alike. But here's the twist: while it doesn't officially support Steam OS, it still delivers a streamlined gaming experience through innovative alternatives. Could this be the future of gaming on the go? Belolw, ETA Prime explains how the X+ Rival, bridges the gap between traditional gaming PCs and console-like simplicity. From its innovative hardware—including a Radeon 8060SI GPU and a staggering 96 GB of RAM, to its flexible dual-boot system featuring Windows 11 Pro and Basite Linux, this device is designed to cater to a wide range of needs. Whether you're a gamer looking for smooth 1440p gameplay, a professional seeking a compact workstation, or simply curious about what's possible in a mini PC, this deep dive will reveal how the X+ Rival can transform your gaming and computing experience. Sometimes, the smallest packages hold the biggest surprises. Compact High-Performance Gaming PC Hardware Specifications: Power in a Small Package The X+ Rival is built around the AMD Ryzen AI Max Plus 395 APU, a processor designed to deliver exceptional performance in a compact form factor. Key hardware highlights include: 16 cores and 32 threads: Making sure smooth multitasking and efficient gaming performance. Making sure smooth multitasking and efficient gaming performance. Radeon 8060SI GPU: Equipped with 40 compute units, allowing fluid gameplay at 1440p resolution. To complement this powerhouse, the system includes 96 GB of RAM clocked at an impressive 8,000 MT/s, making sure seamless multitasking and rapid data processing. Storage is equally robust, with dual M.2 SSD slots supporting up to 2 TB each, allowing for a dual-boot setup and ample space for games, applications, and media. Connectivity is another standout feature. With Wi-Fi 7 and Bluetooth 5.2, the X+ Rival ensures fast and stable wireless connections, making it suitable not only for gaming but also for everyday tasks like streaming, remote work, and file sharing. This combination of hardware and connectivity makes the X+ Rival a compact powerhouse that punches well above its weight. Operating System and Software: Dual-Boot Flexibility One of the most innovative aspects of the X+ Rival is its dual-boot system, which allows users to switch between two distinct operating environments: Windows 11 Pro: A familiar and versatile platform for productivity, gaming, and general use. A familiar and versatile platform for productivity, gaming, and general use. Basite Linux: A gaming-focused operating system offering both desktop and console-like modes for a unique user experience. Basite Linux stands out by providing access to a wide range of apps, emulators, and Steam functionality. While the X+ Rival doesn't officially support Steam OS due to driver compatibility issues with the Radeon 8060SI GPU, Basite Linux effectively fills this gap. It offers a streamlined gaming experience, making it a viable alternative for users who want a console-like interface without sacrificing the flexibility of a PC. This dual-boot capability not only enhances the system's versatility but also caters to a broader audience, from gamers to professionals who require a reliable and adaptable computing environment. X+ RIVAL Mini Gaming PC : A Micro Steam Machine Watch this video on YouTube. Check out more relevant guides from our extensive collection on Mini Gaming PC that you might find useful. Performance Features: Tailored for Gamers The X+ Rival is designed with gamers in mind, offering three distinct performance modes to suit different needs: Quiet Mode: Ideal for low-power tasks, making sure minimal noise and energy consumption. Ideal for low-power tasks, making sure minimal noise and energy consumption. Balanced Mode: A versatile setting for everyday use, balancing performance and efficiency. A versatile setting for everyday use, balancing performance and efficiency. Performance Mode: Unlocks the system's full potential, with a thermal design power (TDP) of up to 140W for demanding applications and games. In gaming benchmarks, the X+ Rival excelled in popular titles such as Spider-Man 2, Cyberpunk 2077, and Ratchet and Clank: Rift Apart. At 1440p resolution, it delivered smooth gameplay, although some games required medium settings for optimal performance. These results demonstrate the system's ability to handle modern games effectively, even within its compact design. Beyond gaming, the X+ Rival's performance modes make it suitable for a variety of tasks, from video editing to software development, further enhancing its appeal as a multi-purpose device. Limitations and Opportunities for Growth Despite its impressive features, the X+ Rival does have some limitations that potential buyers should consider: Steam OS Support: The lack of official support stems from driver compatibility issues with the Radeon 8060SI GPU, limiting its appeal to users seeking a native Steam OS experience. The lack of official support stems from driver compatibility issues with the Radeon 8060SI GPU, limiting its appeal to users seeking a native Steam OS experience. Price Point: The premium cost of the X+ Rival may deter some users, especially when larger systems with similar performance are available at lower prices. However, these challenges also present opportunities for growth. Future driver updates and software enhancements could potentially enable Steam OS support, broadening the system's compatibility and appeal. Additionally, as hardware prices evolve, the X+ Rival may become more accessible to a wider audience, solidifying its position in the market. Compact Design and Usability: Built for Convenience The X+ Rival's compact design is one of its most attractive features, making it an excellent choice for users who value portability. Its small form factor allows for easy transportation, whether you're moving between rooms or taking it on the go. The inclusion of customizable RGB lighting adds a touch of personalization, while the front-facing performance mode button ensures quick and intuitive adjustments. This device is particularly well-suited for gamers who need a high-performance mini PC that doesn't compromise on power or versatility. It's equally appealing to professionals seeking a compact workstation capable of handling demanding tasks. The X+ Rival strikes a balance between form and function, offering a solution that caters to a variety of needs. Final Thoughts The X+ Rival Mini Gaming PC showcases the remarkable potential of compact systems, combining powerful hardware, dual-boot functionality, and customizable performance modes in a portable package. While it faces challenges such as the lack of Steam OS support and a premium price tag, its strengths make it a compelling choice for users seeking a versatile and high-performance device. As technology continues to advance, the X+ Rival is well-positioned to remain a relevant and innovative option in the mini PC market. Media Credit: ETA PRIME Filed Under: Gaming News, Hardware, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

How to Summarise Any Screenshot in Seconds with iOS 26
How to Summarise Any Screenshot in Seconds with iOS 26

Geeky Gadgets

time29 minutes ago

  • Geeky Gadgets

How to Summarise Any Screenshot in Seconds with iOS 26

Apple's iOS 26 public beta introduces a suite of innovative updates that redefine how you manage and interact with screenshots. By integrating advanced AI capabilities through Apple Intelligence and ChatGPT, this update improves screenshots from static images to dynamic tools for summarization, analysis, and actionable insights. With a redesigned interface and innovative features, iOS 26 enhances usability, making screenshots an indispensable resource for both personal and professional applications. The video below fromJacob's QuickTips for iPhone gives us more information. Watch this video on YouTube. Redesigned Screenshot Interface for Enhanced Usability The screenshot interface in iOS 26 has been carefully redesigned to prioritize efficiency and accessibility. When you capture a screenshot, you'll encounter a cleaner, more intuitive layout that simplifies interaction. Essential tools such as editing, sharing, and analysis are now organized in a way that minimizes friction. A dedicated toolbar appears directly beneath each screenshot, allowing you to perform actions without navigating through multiple menus. This streamlined design ensures that you can interact with your screenshots immediately and effortlessly, saving time and improving productivity. Quick Actions: Simplifying Screenshot Interaction One of the standout features in iOS 26 is the introduction of quick actions for screenshots. These options, displayed directly below the captured image, enable you to perform tasks like sharing, saving, or annotating with minimal effort. Among these tools is the new 'Summarize' button, powered by Apple Intelligence. This feature generates concise summaries of your screenshot's content, whether it's a text-heavy article, a detailed chart, or a complex email. By providing an instant overview, the summarization tool helps you focus on what matters most, making it an invaluable asset for users managing large volumes of information. AI-Powered Summarization: Extracting Key Insights The summarization feature uses Apple Intelligence, Apple's proprietary AI framework, to analyze and distill the content of your screenshots into key points. For instance, if you capture a screenshot of a lengthy email or a dense academic paper, the AI can extract the main ideas, allowing you to quickly grasp the essence without reading the entire text. This capability is particularly beneficial for professionals juggling extensive workloads or students reviewing study materials. Its ability to process both text and visuals ensures that it remains versatile across a wide range of use cases, from business to education. ChatGPT Integration: Unlocking Advanced Analysis For users seeking deeper insights, iOS 26 integrates ChatGPT, a leading conversational AI model. This feature allows you to manually activate ChatGPT to delve into your screenshot's content, providing context, answering questions, or generating related information. For example, if your screenshot contains a technical diagram, ChatGPT can explain its components or suggest additional resources for further exploration. This integration bridges the gap between static content and dynamic interaction, transforming screenshots into tools for exploration, understanding, and problem-solving. Manual ChatGPT Invocation for Tailored Assistance While Apple Intelligence handles basic summarization automatically, iOS 26 gives you the option to manually invoke ChatGPT for more complex tasks. By sharing your screenshot with the AI, you can receive customized responses tailored to your specific needs. Whether you're deciphering a dense financial report, analyzing a legal document, or extracting insights from a research paper, ChatGPT's advanced language capabilities complement Apple's built-in tools. This manual invocation ensures that you maintain control over how and when to use AI assistance, offering flexibility for a variety of scenarios. Part of a Comprehensive AI-Driven Ecosystem The enhanced screenshot tools in iOS 26 are part of a broader ecosystem of AI-driven innovations. These updates reflect Apple's commitment to integrating intelligent features into everyday tasks, making devices smarter and more user-centric. Beyond screenshots, iOS 26 introduces advancements in multitasking, privacy controls, and overall device performance. Together, these features highlight Apple's focus on creating a seamless, intelligent user experience that adapts to your needs while maintaining the highest standards of security and efficiency. Transforming the Role of Screenshots With iOS 26, Apple reimagines the role of screenshots, transforming them from simple images into dynamic sources of information and insight. The combination of a redesigned interface, quick actions, and AI-powered tools like summarization and ChatGPT integration enables you to extract value from your captured content effortlessly. These innovations not only enhance productivity but also demonstrate the potential of AI to simplify complex tasks. As iOS 26 moves closer to its official release, these features represent a significant leap forward in making technology more intuitive, impactful, and aligned with your daily needs. Find more information on Apple Intelligence by browsing our extensive range of articles, guides and tutorials. Source & Image Credit: Jacob's QuickTips for iPhone Filed Under: Apple, Apple iPhone, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store