logo
#

Latest news with #TimRathschmidt

Reddit blocks most Wayback Machine access after catching AI firms scraping archived data
Reddit blocks most Wayback Machine access after catching AI firms scraping archived data

India Today

time2 days ago

  • Business
  • India Today

Reddit blocks most Wayback Machine access after catching AI firms scraping archived data

Reddit has decided to block most of the Internet Archive's Wayback Machine from accessing its website, as per a report by The Verge. This was done after it was discovered that AI companies were scraping archived Reddit content without permission. The Wayback Machine, a long-standing tool that lets people view websites as they appeared in the past, will now only be able to index Reddit's homepage. That means it will no longer be able to archive post detail pages, comments, or profiles. In practice, the archive will only show which posts and headlines were trending on any given day, rather than preserving the full content behind spokesperson Tim Rathschmidt told The Verge, 'Internet Archive provides a service to the open web, but we've been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine.' The company says that until the Internet Archive can ensure it protects user privacy and complies with platform rules, such as removing deleted content, it is restricting access 'to protect redditors.' Rathschmidt added that Reddit had informed the Internet Archive in advance of the changes and that restrictions would start 'ramping up' Internet Archive's stated mission is to preserve a record of websites and other digital cultural materials for public use. However, Reddit argues that this mission is being undermined when third parties exploit the archive's open access for commercial gain, particularly to train AI models. Rathschmidt pointed out that Reddit had 'raised concerns' about scraping from the Wayback Machine before, suggesting this has been a long-brewing issue rather than a sudden decision. Over the past few years, Reddit has become more aggressive about controlling access to its data, especially in the face of growing demand for AI tools. In 2023, the platform made controversial API changes that forced some third-party apps to shut down, sparking user protests. Reddit claimed those changes were necessary because APIs were being misused to gather content for AI training. Last year, it cut deals with major companies like Google and OpenAI to provide access to data, but crucially, only in exchange for payment. The Verge notes that Reddit even sued AI firm Anthropic in June, accusing it of continuing to scrape content after promising to Wayback Machine has been a valuable tool for researchers, journalists, and the general public, helping preserve the history of the internet. Yet, as more companies rush to feed AI models with vast amounts of text and images, platforms like Reddit are rethinking how much of their content should remain freely accessible. Mark Graham, director of the Wayback Machine, told The Verge, 'We have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter.' That statement suggests negotiations are still possible. But for now, Reddit's move will significantly limit the archive's ability to capture and preserve its content.- Ends

Reddit posts will not be archived on Wayback Machine: Here's what it means
Reddit posts will not be archived on Wayback Machine: Here's what it means

Business Standard

time2 days ago

  • Business Standard

Reddit posts will not be archived on Wayback Machine: Here's what it means

Reddit has reportedly begun blocking the Internet Archive's Wayback Machine from indexing much of its content to prevent AI firms from harvesting user data. According to The Verge, this move restricts archival access to Reddit's homepage only, removing the ability to crawl individual posts, comments, and user profiles — effectively limiting archival visibility to top trending content. What's changing As reported by The Verge, Reddit spokesperson Tim Rathschmidt confirmed that this restriction is triggered by AI companies violating platform rules, using the archive to 'scrape data from the Wayback Machine.' The platform's statement emphasises, 'Until they're able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content), we're limiting some of their access to Reddit data to protect redditors.' These changes are now being implemented, with Reddit notifying the Internet Archive ahead of time to ensure a smoother transition. Why it matters According to The Verge, the Wayback Machine will now only archive Reddit's homepage — meaning users, researchers, or journalists will lose access to historical snapshots of discussions, deleted posts, and individual profiles. This significantly reduces the publicly available Reddit archive, potentially impacting investigative work, content verification, and online historical records. Broader context Reddit's action follows a pattern of tightening control over its data. Previously, it monetised search and AI training access through partnerships with Google and OpenAI, redesigned its APIs to limit unauthorised use, and even sued other AI firms, like Anthropic, for continued scraping despite commitments to stop. The Verge quoted Mark Graham, director of the Wayback Machine, as saying: 'We have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter.'

Reddit locks out Wayback machine to stop AI from scraping old posts
Reddit locks out Wayback machine to stop AI from scraping old posts

Time of India

time2 days ago

  • Business
  • Time of India

Reddit locks out Wayback machine to stop AI from scraping old posts

Reddit has announced that it will restrict the Internet Archive's Wayback Machine to archiving only its homepage, blocking the tool from saving most of its site's content. This change comes as a direct response to increasing concerns about AI companies scraping Reddit data through the Wayback Machine, possibly risking Reddit's content policies and violating user privacy. Why Reddit Is Restricting Access According to Reddit spokesperson Tim Rathschmidt, the company has seen cases where artificial intelligence firms accessed Reddit's content via the Wayback Machine without adhering to Reddit's terms of service. This includes scraping of posts, comments, and even deleted or removed content. Such unauthorized activities challenge Reddit's ability to manage and protect its content. Finance Value and Valuation Masterclass Batch-1 By CA Himanshu Jain View Program Finance Value and Valuation Masterclass - Batch 2 By CA Himanshu Jain View Program Finance Value and Valuation Masterclass - Batch 3 By CA Himanshu Jain View Program Artificial Intelligence AI For Business Professionals By Vaibhav Sisinity View Program Finance Value and Valuation Masterclass - Batch 4 By CA Himanshu Jain View Program Artificial Intelligence AI For Business Professionals Batch 2 By Ansh Mehra View Program Rathschmidt emphasized that until the Internet Archive can guarantee compliance with Reddit's policies, this restriction will stay in place to safeguard users' privacy and preserve the integrity of removed content. Impact on the Wayback Machine's Archiving The Wayback Machine is a widely used tool operated by the Internet Archive, designed to preserve snapshots of websites over time. This archival service enables users to view historical versions of web pages, which is useful for research, fact-checking, and maintaining internet history. With Reddit's new limitation, the Wayback Machine will no longer archive specific Reddit pages like posts or user profiles, only the homepage. This significantly reduces the breadth and depth of Reddit's content saved by the archive, restricting public access to old discussions and deleted data through this service. Live Events Reddit's Data Control Measures This restriction is part of Reddit's broader effort to control how its data is accessed and used, especially by AI companies. Recently Reddit has taken many steps to protect its content, including modifying its application programming interfaces (APIs) to limit data scraping, negotiating paid data licenses with firms like Google and OpenAI, and pursuing legal action against the companies such as Anthropic for unauthorized data collection. Reddit's goal is to balance user privacy, platform safety, and its business interests by carefully regulating third parties, who can access its vast content. Current and Future Outlook Mark Graham, director of the Wayback Machine, confirmed ongoing discussions with Reddit about this issue but no formal announcement has been made. The Internet Archive community and users who rely on its archiving service await further updates to understand the long-term implications for internet preservation. This move by Reddit highlights the complex challenge of protecting user privacy while preserving internet content at the same time, especially as AI technologies rely on large datasets gathered from the web. FAQs: Q1. What is Reddit? A1. Reddit is an online community where users share posts, comments, and discussions on various topics. Q2. What is the Wayback Machine? A2. The Wayback Machine is a tool that archives and lets people view past versions of websites.

Reddit will block the Internet Archive
Reddit will block the Internet Archive

The Verge

time2 days ago

  • Business
  • The Verge

Reddit will block the Internet Archive

Reddit says that it has caught AI companies scraping its data from the Internet Archive's Wayback Machine, so it's going to start blocking the Internet Archive from indexing the vast majority of Reddit. The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles; instead, it will only be able to index the homepage, which effectively means IA will only be able to archive insights into which news headlines and posts were most popular on a given day. 'Internet Archive provides a service to the open web, but we've been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,' spokesperson Tim Rathschmidt tells The Verge. The Internet Archive's mission is to keep a digital archive of websites on the internet and 'other cultural artifacts,' and the Wayback Machine is a tool you can use to look at pages as they appeared on certain dates, but Reddit believes not all of its content should be archived that way.'Until they're able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) we're limiting some of their access to Reddit data to protect redditors,' Rathschmidt says. The limits will start 'ramping up' today, and Reddit says it reached out to the Internet Archive 'in advance' to 'inform them of the limits before they go into effect,' according to Rathschmidt. He says Reddit has also 'raised concerns' about the ability of people to scrape content from the Internet Archive in the past. Reddit has a recent history of cutting off access to scraper tools as AI companies have begun to use (and abuse) them en masse, but it's willing to provide that data if companies pay. Last year, Reddit struck a deal with Google for both Google Search and AI training data early last year, and a few months later, it started blocking major search engines from crawling its data unless they pay. It also said its infamous API changes from 2023, which forced some third-party apps to shut down, leading to protests, were because those APIs were abused to train AI models. Reddit also struck an AI deal with OpenAI, but it sued Anthropic in June, claiming Anthropic was still scraping from Reddit even after Anthropic said it wasn't scraping anymore. The Internet Archive didn't immediately respond to a request for comment. Posts from this author will be added to your daily email digest and your homepage feed. See All by Jay Peters Posts from this topic will be added to your daily email digest and your homepage feed. See All News Posts from this topic will be added to your daily email digest and your homepage feed. See All Reddit Posts from this topic will be added to your daily email digest and your homepage feed. See All Tech

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store